Best Practices for Oracle RAC databases with SGA size over 100GB
All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation.
As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. Oracle Support has identified 100 GB as a baseline for large SGA's that would benefit from the recommendations provided in this note. However, this is just a baseline, and it is possible for similar(but smaller) SGA's to benefit from these recommendations. It is thus imperative that any recommendations from this note are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made
SCOPE
This article applies to all new and existing RAC implementations.
This is for RAC databases only as most of the parameters listed in here are for RAC Database only.
DETAILS
Note that the recommendations presented in this note are a result of the experience from working on databases with SGA of 1 TB and 2.6 TB.
However, the databases with SGA of 100GB and 300GB also benefited from the recommendations
Also, some recommendation is removed for 18.1 and above, so check if the recommendation is applicable to your database.
Note: ORAchk 18.2 and above can be used to validate the proper settings for Large SGA Databases.
Though the check is available within ORAchk 18.2, it is always recommended to use the latest version of ORAchk which is available via <Document 1268927.2> to ensure you are receiving the most up-to-date information.
Download latest AHF.
Refer to Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAchk Document 2550798.1
Now the latest version is AHF 24.3.0 for Linux
Spfile / init.ora parameters:
a. Set _lm_sync_timeout to 1200
(this recommendation is valid only for databases that are 12.2 and lower)
Setting this will prevent some timeouts during reconfiguration and DRM. It's a static parameter and rolling restart is supported.
b. Set shared_pool_size to 15% or larger of the total SGA size.
For example, if SGA size is 1 TB, the shared pool size should be at least 150 GB. It's a dynamic parameter.
c. Set _gc_policy_minimum to 15000
There is no need to set _gc_policy_minimum if DRM is disabled by setting _gc_policy_time= 0.
_gc_policy_minimum is a dynamic parameter, _gc_policy_time is a static parameter and rolling restart is not supported. To disable DRM, instead of _gc_policy_time, _lm_drm_disable should be used as it's dynamic.
Note: 15000 is the new default in 23c, 19c DBRU JUL '23, and 19c ADB.
Customer won't have to tune this parameter any more in those releases or later.
This is due to internal bug 34729755.
d. Set _lm_tickets to 5000
(this recommendation is valid only for databases that are 12.2 and lower)
Default is 1000.
Allocating more tickets (used for sending messages) avoids issues where we ran out of tickets during the reconfiguration. It's a static parameter and rolling restart is supported. When increasing the parameter, rolling restart is fine but a cold restart can be necessary when decreasing.
e. Set gcs_server_processes to the twice the default number of lms processes that are allocated.
(this recommendation is valid only for databases that are12.2 and lower)
The fix is also included in the 12.2.0.1 JUL 2018 database RU, so this does apply to the database that is running on 12.2.0.1 JUL 2018 or higher.
The default number of lms processes depends on the number of CPUs/cores that the server has, so please refer to the gcs_server_processes init.ora parameter section in the Oracle Database Reference Guide
for the default number of lms processes for your server. Please make sure that the total number of lms processes of all databases on the server is less than the total number of CPUs/cores on the server.
Please refer to the Document 558185.1
It's a static parameter and rolling restart is supported.
f.📍Set TARGET_PDBS to the number of PDBs that are planned to be running in the CDB. 📍
Do not add seed and root in this count.
(This recommendations is valid for 12.2 databases and higher)
The default value of TARGET_PDBS, especially for databases with a large sga_target setting, is known to cause performance and instance eviction issues.
For detailed description of issues related to target_pdbs, refer to the Document 2644243.1
Setting up hugepages is a general recommendation for all Linux users, but using hugepages is particularly more important for database that has large SGA.
In other words, setting up hugepages when SGA is large is a critical recommendation.
For other platforms, consider using large pages if possible.
Following patches are recommended:
11.2.0.3.5 DB PSU or above is highly recommended to address known issues with large SGA sizes.
For SGA that is larger than 4 TB and for Linux platform,
BUG 18780342 - LINUX SUPPORT FOR > 4TB SGA
💡Due to the importance of the TARGET_PDBS parameter, in this article I will explain more about it with an emphasis on fine-tuning it, and I will talk more about that.
🔴Performance Issues when using PDBs with Oracle RAC 19c and 18c
Review of issue
Using the new default sizing for internal data structures, Oracle Real Application Clusters (RAC) CDBs where the number of actual PDB’s is lower than the TARGET_PDBS parameter setting can inadvertently be subject to a negative performance impact due to the internal sizing for a much larger number of PDBs in the same CDB..
In order to ensure predictable Cache Fusion performance during normal operation as well as to minimize the impact on applications during DRM operations or Oracle RAC reconfiguration operations (as a result of an instance start/stop or PDB open/close for example), customers with lower numbers of PDBs per CDB should set the TARGET_PDBS initialization parameter as described in the workaround section of this note.
High "latch: gcs resource hash", "gcs drm freeze in enter server mode" potentially in combination with “"gcs drm freeze in enter server mode" or "latch:ges resource hash list" wait events in AWR reports are indicative of the sizing misalignment as described above
High "gcs drm freeze in enter server mode" in AWR report as shown below
High "latch:ges resource hash list"
Solution
Set the init.ora parameter TARGET_PDBS to the number of PDBs that are planned to be running in the CDB. Please do not add seed and root in this count
For example: If the current number of PDBs is 5, but the plan is to run 10 PDBs TARGET_PDBS should be set to 10 accordingly. The number does not have to be exact but as close as possible to the number of planned PDBs
Target_PDBS=<#_PDBs>
This init.ora parameter should be set in the spfile and can be activated using a rolling restart of all the Oracle RAC instances.
This parameter is only pertinent when using Oracle Multitenant databases. It does not apply to Non-CDB environments, neither to Oracle Autonomous Database
🔴Note: Environments running with large SGA size will especially be affected if the TARGET_PDBS parameter is left to default
How is the default calculated?
♨️The default
TARGET_PDBS = 2x (SGA_SIZE /512MB)
We recommend using the orachk or exachk which will automatically calculate the values for your environment and recommend the best value for TARGET_PDBS.
I hope this article was useful for you.
Best Regards,
Alireza Kamrani