Friday, April 11, 2025

Tuning & Troubleshooting Oracle Data Guard (Part 1)

Tune and Troubleshoot Oracle Data Guard (Part 1 of 8)


             Alireza Kamrani 

               04/10/2025

When redo transport, redo apply, or role transitions are not meeting your expected requirements, use the following guidelines to help you tune and troubleshoot your deployment.


Overview of Oracle Data Guard Tuning and Troubleshooting

To get the best performance from your Oracle Data Guard configuration, use the following Oracle MAA best practices for monitoring, assessment, and performance tuning.


  • Ensure that Oracle Database and Oracle Data Guard configuration best practices are in place.
    The assumption when assessing and tuning is that all of the Oracle Database and Data Guard configuration best practices are already integrated in the environment. Evaluate the adherence to those best practices before doing any tuning.
  • Assess and tune redo transport services
    Oracle Data Guard automatically tunes redo transport to optimize performance. However, if you observe performance issues, you can monitor and tune redo transport services.
    Asynchronous redo transport with Maximum Performance data protection mode is the default Oracle Data Guard configuration. Tuning asynchronous redo transport consists mainly of ensuring that the primary, standby, and network resources are sufficient for handling the workload, and that you monitor those resources for bottlenecks.
    Synchronous redo transport does sacrifice some performance for zero data loss; however, using sound MAA recommended methods, you can monitor and assess the impact and distribute resources appropriately.
  • Assess and tune redo apply
    In most cases, the default Oracle settings result in satisfactory performance for media recovery when the standby is always up to date. However, as applications and databases increase in size and throughput, media recovery operations can benefit from additional tuning to further optimize recovery time or redo apply throughput on a standby database
  • Assess and tune role transitions
    With proper planning and implementation, Oracle Data Guard and Active Data Guard role transitions can effectively minimize downtime and ensure that the database environment is restored with minimal impact on the business. Performance tests using a physical standby database and Oracle Maximum Availability Architecture (MAA) best practices have shown that switchover and failover can be reduced to seconds.


Redo Transport Troubleshooting and Tuning

Oracle Data Guard redo transport performance is directly dependent on the performance of the primary and standby systems, the network that connects them, and the I/O subsystem.

For most Oracle Data Guard configurations, you should be able to achieve zero or minimal data loss by troubleshooting and tuning redo transport.


To improve transport holistically, leverage the data gathering and troubleshooting methodology described in the topics below, which guide you through gathering the necessary data to assess whether there is indeed a redo transport problem and what can be tuned to optimize redo transport throughput.


Gather Topology Information

Understanding the topology of the Oracle Data Guard configuration, and its relevance to Data Guard performance, helps eliminate infrastructure weaknesses that are often incorrectly attributed to the Data Guard architecture.

Oracle recommends that you outline the following high-level architecture information.

  • Describe the primary and standby database system (number of nodes in Oracle RAC cluster, CPUs and memory per database node, storage I/O system)
  • Describe network topology connecting the primary and standby systems
    • Network components/devices in between primary and standby
    • Network bandwidth and latency


📕For standby databases with symmetric hardware and configuration, and with a good tuned network configuration, the transport lag should be less than 10 seconds and in most cases less than 1 second. 


Verify Transport Lag and Understand Redo Transport Configuration

To determine if there is any lag on the standby database, and if this is a transport or apply lag, query the V$DATAGUARD_STATS view.


SQL> select name,value,time_computed,datum_time from v$dataguard_stats where name=’%lag’;


The DATUM_TIME column is the local time on the standby database when the datum used to compute the metric was received. The lag metrics are computed based on data that is periodically received from the primary database. An unchanging value in this column across multiple queries indicates that the standby database is not receiving data from the primary database. The potential data loss in this scenario would be from the last datum time from V$DATAGUARD_STATS to the current time on the standby.


To obtain a histogram that shows the history of transport or apply lag values since the standby instance was last started, query the V$STANDBY_EVENT_HISTOGRAM view.


SQL> select * from v$standby_event_histogram where name like '%lag' and count >0;


To evaluate the transport or apply lag over a time period, take a snapshot ofV$STANDBY_EVENT_HISTOGRAM at the beginning of the time period and compare that snapshot with one taken at the end of the time period.


SQL> col NAME format a10


SQL> select NAME,TIME,UNIT,COUNT,LAST_TIME_UPDATED from V$STANDBY_EVENT_HISTOGRAM 

where name like '%lag' and 

count >0 

order by LAST_TIME_UPDATED;


NAME     TIME UNIT    COUNT LAST_TIME_UPDATED


---------- ---------- -------- ---------- -------


transport lag      41 seconds         3      01/05/2022 16:30:59


transport lag     245 seconds          1      01/05/2022 16:31:02


transport lag     365 seconds          2      01/05/2022 16:31:03


transport lag     451 seconds          2      01/05/2022 16:31:04


Possible scenarios:


  1. You observe a high redo transport lag.
  2. You see no transport lag but a high redo apply lag.


Solution:


  1. Continue this redo transport investigation with Gather Information to Troubleshoot Transport Lag. 
  2. Address the apply lag using the methodology in Redo Apply Troubleshooting and Tuning.


🎞️I will explain the above solution in the next post. 

⏰Wait to publishing continues posts about Tuning and Troubleshooting Oracle Data Guard



Alireza Kamrani 

No comments:

Post a Comment

Tuning and Troubleshooting Synchronous Redo Transport (Part 2)

Tuning and Troubleshooting Synchronous Redo Transport (Part  2 ) Alireza Kamrani (06 /29/ 2025)     Understanding What Causes Outliers Any d...