I would like to share this case to show you how a bad planning during the instance configuration brings problems sooner or later. I really like this example because it perfectly shows how doing things in a proper way in the beginning will avoid a lot of headaches in the future. That being said, come closer and listen to a wonderful story about SAP performance…
The problem
The end users started to complain about the performance of the system. The users said that they didn’t execute any important transaction or program considering the amount of resources needed by that transaction or program. Since it is the middle of August no end month closing procedure is executed within the SAP system. The information about the system is:
- SAP ERP 6.00 EHP4.
- SAP MaxDB 7.2
- SLES 11 SP4.
- Kernel 722 Patch300.
- Central Instance + DB on same server.
The investigation
In this case the investigation was a little tricky because I stared to look through the system on a really specific and bad time. The initial checks shown that there was a work process being executed within the system that was consuming a high number of resources. I checked the operating system and I found than the memory was almost fully used:
As you can see the process 27743 was consuming about 45,8% of the memory. This process was releated to a work process on the SAP system:
As you can see the work process is on PRIV mode meaning it just reach the quota for the extended memory and it is using heap memory. If we took a look in the operating system we will see that it is using a lot of memory:
Notice in the first screenshot (top command) that the % of waiting is really high (about 57,5% when I took the screenshot). The %wa in the top command shows the iowait which is the amount of time the CPU has been waiting for I/O to complete. I executed a vmstat command to show how the %wa changed during time and I found something interesting:
This screenshot was not taken during the exact same moment I took the top command screenshot but it is quite similar to what happened on that moment. The %wa is always really high, I even show some values around 70% in certain moments. In this moment I concluded that the high iowaiting was related to the work process on PRIV mode…
Understanding quotas on ABAP instances
Memory on SAP instances is a really big topic and it will cover several articles. Anyway for this example what you have to understand is that the work process consumes roll, paging, extended and heap memory. When you define each memory parameter within the instance profile the SAP system reserve the memory area on both physical and swap memory. If you sum both physical and swap memory you will have the virtual memory. In the following screen we can see as SAP allocates the memory:
It is possible to limit the amount of physical memory that the SAP system will consume. Ideally we want to use as many physical memory as we could because is quicker than the swap memory. In some cases we will limit it, for example when both database and SAP system shared the same hosts. Once the physical memory is used the operating system will start to use swap memory which is slower than physical memory. In this case the operating system will start paging swapping memory segments from physical memory into swap memory. This will cause a high use of iowait within the CPU since the swap memory is slower than the RAM memory as we said before.
How can we avoid this situations? It won’t make any sense if a work process can use the whole memory within the SAP and operating system. SAP created the memory quotas which we can use to limit the amount of memory that a work process can use while running. The different memory areas are used when reaching each of its quotas. For dialog work processes the memory areas are used as it shows the following screenshot:
You can see the quotas using transaction ST02 if you click in Goto – Current local data – SAP memory – Quotas. It is also possible to modify the quotas without restarting the system using program RSMEMORY:
The solution
I checked the memory quotas within the system and I found the following information:
As you can see for dialog tasks the quotas are almost 8GB and 49GB for extended and heap memory. This means that a work process will be able to consume 8GB of extended memory and 48GB of heap memory before it will be finished with a dump… I checked the memory areas in the ST02 transaction just to be sure how much memory was available:
Considering the quotas defined within the instance it’s pretty clear that a single work processes can take down the whole system. I can execute a transaction and consume as much as 8GB of extended memory and 48GB of heap memory. Take a look to the total amount of memory available on the operating system:
The current swap size is 49GB, almost the same amount of heap memory quota defined in the instance profile. So what happened was that the process consumed the whole amount of extended memory allowed by the quota. Then it started to consume heap memory up to 48GB as defined in the quota. Since the physical memory was completely used by the work process and the database the operating system started to swap from physical to swap memory. This caused a lot of iowait to any process being executed within the server.
I talked with the user in charge of the transaction and we decided to cancel the process. After this the following actions will be performed:
- Check the Z program executed by the user. It doesn’t make any sense that it will consume 7GB of extended memory plus tons of heap memory.
- Reduce the memory quotas for extended and heap memory:
- The default value for extended memory quota is 2GB which should be more than enough for any program execution.
- The heap memory quota can’t be almost the same amount of the sum of swap and physical memory.
Conclusions
Instance profile configuration is a really important tasks. When installing and configuring an ABAP system the instance configuration should be done considering the starting use of the system. This configuration should change during the following months/years so it will be correct for the current and future use of the system.
Also please keep in mind that program, processes and transaction should be analyze and improved before deploying in production and in the future. It is a really bad idea to change an instance parameter because a Z program was developed without taking care of which performance it will have when running with real data.