It is quite usual that customers ask for a performance study for their systems. Performance is a big pain in the ass because as I always said it depends on the point of view of everyone. Lets take FBL3N transaction for example. If we execute it without any filter it will probably take ages to show the data, even if it finish correctly (no memory dump). Does this mean we have a performance issue within the system? Absolutely no, it means the user doesn’t know how to use the transaction/report properly…
Why am I telling this? Because a lot of times we won’t have a performance issue. Even with that we need to know when there is a performance issue on the system so, how do we do it? I want to show you a small performance study I did for an SAP ERP system a year ago.
Getting started
First idea is to know how we can measure the system performance and which tools we have. In this case my client wanted to check an SAP ERP. The details of the SAP system are:
- SAP Product: SAP ERP 6.0 EHP4.
- Kernel: 721_EXT Patch 500.
- Operating System: SLES 11 Sp3.
- Database: Oracle 11.2.0.2
- Architecture: 2 servers, both on the same subnetwork.
- DB server.
- SAP Instance (ASCS + CI).
First thing we should do to check if there is a performance issue is to execute transaction ST03N. On this transaction you have the response times for each of the task types, including the parts of response time for each of elements of the system. So if just took the response time, CPU time, etc. of the last 3 months and I did some graphics and tables:
Considering the data provided on both the table and the graphic, is there a performance issue on the system? The recommended value for the Average Response Time per Dialog Step is between 1.000ms and 1.500ms. On this example we have an Average Response Time of 1.089 which is higher than the lower threshold of the recommended value. It is not good but it is not the end of the world… Anyway, we can improve the response time so the next step is to check which elements can be improved. We check back the table and see which values has “Yes” on the Anomaly field:
- Ø DB Time (ms): Related with the database response time. High time on database could be caused by bad parameters, problems with the I/O, network issue, etc.
- Average Roll In Time (ms) and Average Roll Wait Time (ms): Related with the roll time. High time here could be caused by network issue (not always).
You can see it easier on the following table, the Database and Roll Wait time have the higher percentage:
Once we found which elements have the higher response time the next step is to check how they are working and how they can be improved. In this case we focus on the database since it was the element with higher possibilities of improving.
Having a small talk with the database
We can do several actions on the database so we can improve the response time:
Database Version:
It is really important to check the database version to see if it is up to date. In this case the version was 11.2.0.2 and the SPB was 11.2.0.2.8.201211. If we check the Support Portal we can find which is the latest version and the latest SPB available. Considering the version and SPB it is a good idea to update the database to 11.2.0.5 at least and install the latest SPB.
Newer database version and SPB could fix performance issues, security issues, etc. SAP Support can ask you to update your database version if they see it is not up to date, remember to update the version frequently.
Database Parameters:
SAP releases different notes showing the proper parameters for the database depending on its version. You have notes available for version 9i, 10g, 11g and 12c. It is really important to understand that SAP needs some parameters with specific values. It doesn’t matter you have been administrating Oracle databases for a thousand of years, just follow the SAP Note.
In this case the SAP Note that shows the parameters for Oracle 11g is 1431798 – Oracle 11.2.0: Database Parameter Settings. If you want to check automatically the database parameter you can use the script from SAP Note 1171650 – Automated Oracle DB parameter check. If you want to update the database parameters automatically you can use the Linux script included in SAP Note 2124058 – Oracle Parameter Updater. I’m not going to write the parameters I changed, just remember to adjust the event parameters properly!
Database analysis:
Next step is to do a small database analysis using the Workload Reporting on transaction ST04 (you can also use DB02 or DBACOCKPIT transactions). Choose the begin and end snapshots and click on AWR Report:
Don’t be afraid, this report will show a lot of information and you could feel a little bit overwhelmed. First step is to check the database buffers. As you can see in the next screenshot, the Hit Ratio for the database buffers is quite good:
On the database event side we can see that the first one is db file sequential read. This could be caused by different issues, we have to dig deeper:
If we take a deeper look we see that the % DB time if far superior for the db file sequential read compared with the rest of events in the database:
The database events on a chart looks like the following image.
About the wait class within the database the first one is User I/O:
So it is quite clear we have an I/O problem within the database, right? Next step is to discover what could be causing the high I/O times so we can fix it and we can improve the I/O times and the database performance. In order to improve the I/O within the database we can:
- Improve the performance of the disks that contains the datafiles, doing a tune in the operating system or changing them for a better disks.
- Increase the value of the parameters related to database memory (SGA, PGA, Buffers, etc). so we will increase the chance of reading from memory before doing a disk operation.
- Optimize the performance of the SQL queries reducing the number of reads and writes (if possible).
Data files performance:
On the AWR Report there is a point showing the performance of the database datafiles. As you can see in the following image there are some datafiles with high read time
The idea is to check the performance of the disks that contains those data files in order to see if there is any problem with the disk. You can check the I/O performance in the operating system using the command iostat -Nxm 1 -d. Check if there is any filesystem with high Wait and Util values and these filesystems are the same ones that appears on the report.
Reads per table:
I checked on the EWA Report which was the table with most physical and logical reads on the database. The first one on the physical reads side was a Z table with a 60% of physical reads. On the logical reads side the first one was another Z table:
I checked the queries executed on these tables and I found that the queries did a full scan of the table (SELECT * FROM). There were a couple of times when they used some fields on the WHERE clause but there was no index on the table for this fields. The idea here was to optimize the queries as much as possible for no full scans will be done to these tables. Just remember, less full scans means less reads on the table wich means less read on the database which means better I/O.
Database memory:
Another point to reduce the database I/O is the database memory parameters related with the SGA, PGA, etc. We have to distinct between PGA and SGA (sorry but I’m going to quote the Oracle documentation):
- PGA: A PGA is a memory region that contains data and control information for a server process. It is nonshared memory created by Oracle Database when a server process is started. Access to the PGA is exclusive to the server process.
- SGA: The SGA is a group of shared memory structures, known as SGA components, that contain data and control information for one Oracle Database instance. The SGA is shared by all server and background processes.
From version Oracle 11g it is possible to use the Oracle Automatic Memory Management so it is not required to correctly set the SGA and PGA parameters. You only have to set the parameters MEMORY_TARGET and MEMORY_MAX_TARGET and Oracle will manage the memory automatically. Anyway I wanted to set the PGA and SGA parameters and not use the AMM so I checked the current values:
- SGA:
- PGA:
The values were not bad but they could be improved. For the SGA you can execute the following query in your database and it will show the recommend value:
1 |
SELECT * FROM v$sga_target_advice ORDER BY sga_size; |
In this database there was no value returned from that query so I decided to raise the value for the SGA Max Size to 14GB considering the use and size of the database. Also related with the SGA, we checked the recommended values for the Data Buffer and Shared Pool. On transaction ST04 we can see the recommended values for the Data Buffer:
As you can see if we raise the Data Buffer from 4896 to 9600 the disk time goes from 85,30% to 41,50% so I changed the parameter DB_CACHE_SIZE to the recommended value. About the Shared Pool, the ratios were quite good so I decided not to change it:
About the PGA, the current value was 1,6GB and the cache hit percentage was 96%. Checking the recommende values I saw that the cache hit percentage didn’t improve so much so I left the PGA with the same value:
Database statistics:
Another important topic in the database side is the database statistics. I just lost the count of how many database I saw with really good hardware and correct parameters but with a crappy performance because of the lack of statics. The job for creating the database should be executed every night on transaction DB13. I checked the DB13 schedule on the system and I found the following statistics job:
The interesting part is the last one:
-t ALL -c 1 -p 10
These parameters mean we will generate the statistics for all tables (-t ALL) with at least 1% changes from the last optimizing job (-c 1) and with 10 parallel process. This job should be enough as a daily maintenance task. Anyway, we will schedule a monthly task to generate deeper statistics. This should be done during a maintenance window so it won’t affect the users:
brconnect -u / -c -f stats -t oradict_stats >> statistics_log.txt
brconnect -u / -c -f stats -t system_stats >> statistics_log.txt
brconnect -u / -c -f stats -t info_cubes -c 1 -p 6 >>statistics_log.txt
brconnect -u / -c -f stats -o SAPSR3 -t all -m +I -s P10 -f allsel,collect,method,precision,keep -p 4 >>statistics_log.txt
The 4 commands took about 12 hours to be completed and they were executed with the SAP instance stopped.
Database growth:
The current size of the database is 1.540GB with a free space of 77,15. This makes a total size of 1553,30GB
The database growth during the last months is the following one:
The total growth during the last 12 months is about 400GB which is quite high. Considering this we can say we have a problem on the long term with the database size. A bigger database size means that queries to the database will take longer if they are not optimizing since the amount of data contained is higher. In order to reduce the size we can do the following actions:
- Export/Import of the database, schema, etc. to reclaim the unused space in the datafiles.
- Reorganize the bigger tables in order to reclaim the unused space of deleted entries.
- Data Archiving for old data in order to move that data to long-team archival storage systems.
Option 1 takes a long time and it should be done really carefully. Option 3 is an independent project and it usually takes time. So our choice is to go for option 2 since is available as a short-term action. We take a look to the database objects with higher growth in the past weeks:
- Some Z tables. We check with the developers about this table and it seems some of them contains XML files. Ideally this files should not be kept in the database, they should use a independent filesystem for this at the operating system.
- Some standard tables as BSIS, BALDAT, MSEG, REPOLOAD, TST03. The SAP Note 706478 – Preventing Basis tables from increasing considerably describes how to manage some of this tables in order to avoid big sizes.
We check the tablespaces and we find a tablespace named PSAPSR3731. This tablespace was used for a SPS update long time ago, the objects that contains could be moved to PSAPSR3 tablespace and we can delete it afterwards.
Query analysis:
I checked the queries executed on the database using transaction ST04 and clicking on Perfomance – SQL Stamente Analysis – Shared Cursos Cache. Over there you can sort the queries for disk reads, number of executions, CPU Time, etc. The idea here is to find queries with high values in the number of executions, disk reads, CPU Time, CPU Time/Exec, etc. in order to analyze them. After checking the results I saw that there were a lot of Z queries. I’m sorry but I cannot give you a detail about the queries. The steps followed in order to improve them were:
- Check if the SQL statement can be improved somehow. For example, avoiding the SELECT * on big tables if we are not going to use all the fields or avoiding GROUP BY if we already have an index that will do so. If think you get the idea here.
- Check if the tables has an appropriated index for improving the query performance. Please keep in mind that the index creation should be analyzed since sometimes the index creation could affect the performance of another query in a bad way.
Also I checked the EWA report in order to see the objects that were affected within the database by high CPU use, I/O, Execution Time, etc. You should focus on improve the queries using those objects or take additional measures for improve the queries for those objects. For example, using high I/O disk on the datafiles that contains those objects.
Proposals for Improvement
After finish the study the tasks proposed for improve the performance within the system where the following ones:
Shot term:
- Check the database parameters and set them following the SAP recommendations.
- Check the performance of the disk of the database server.
- Increase the RAM memory and adjust the parameters related to Oracle Memory (specially the SGA).
- Full execution of statistics.
Medium term:
- SQL query analysis and improvement.
- Reduce the size of the bigger objects on the database (export/import, reorganize, etc).
Long term:
- Data archiving project to be defined.
The aftermatch
After performing some of the task described before I checked the response time of the SAP system to see if I could find any improvement on the response time:
As you can see the Average Response Time wen from 1.752 in the worst case to 642,8 in the best case. This is the graph:
I was quite happy with the result we achieve after the whole process. This was the usual case of a SAP system that growths on data, users, etc. after some years without a proper capacity management. It could be an example about how a end-user thinks that there is a performance issue within the system but the overall performance in the system is within the recommended values by SAP. There is always room for a little improvement.
I hope you guys liked the articled, I will appreciate any comment you want to do about it. Cheers!
This a great great article!