Friday, April 11, 2025

Tune and Troubleshoot Oracle Data Guard (Part 2 of 8)

Tune and Troubleshoot Oracle Data Guard (Part 2 of 8)


           Alireza Kamrani 

              04/12/2025


In the First part, I demonstrate usage    V$STANDBY_EVENT_HISTOGRAM to finding information about redo apply Lag or Redo transportation Lag.

To evaluate the transport or apply lag over a time period, take a snapshot of V$STANDBY_EVENT_HISTOGRAM at the beginning of the time period and compare that snapshot with one taken at the end of the time period.

As a result there are two scenarios:

Possible scenarios:

  1. You observe a high redo transport lag.
  2. You see no transport lag but a high redo apply lag.

Solution:

  1. Continue this redo transport investigation with Gather Information to Troubleshoot Transport Lag. 
  2. Address the apply lag using the methodology in Redo Apply Troubleshooting and Tuning.

By reading following topic you can find more information .


Gather Information to Troubleshoot Transport Lag

Gather the following information and investigate the questions when an unacceptable redo transport lag is observed:


▶️When did the transport lag occur? Record the V$DATAGUARD_STATS and V$STANDBY_EVENT_HISTOGRAM data to show when the lag started and how the lag is changing over time.


▶️Does the transport lag occur during certain time period, such as daily at 12 midnight for daily batch operations, monthly during large batch operation, or quarterly during quarter end?


▶️Check the LOG_ARCHIVE_DEST setting for any enabled Oracle Data Guard transport, and verify whether redo COMPRESSION or ENCRYPTION is enabled. Overall redo transport throughput can be negatively impacted because redo must be compressed or encrypted before sending, and then uncompressed or unencrypted upon receiving it on the standby. Verify if that change was recent, and if you can test disabling these setting attributes.


▶️Check the Oracle Net settings to evaluate if Oracle Net encryption is enabled. If Oracle Net encryption is enabled, when was it enabled and at what level? Oracle Net encryption can slow down redo throughput significantly because redo is encrypted before sending and unencrypted upon receiving the redo on the standby. Optionally, disable or reduce encryption levels to see if the redo transport lag reduces. 


Compare Redo Generation Rate History on the Primary

There are cases where the primary database redo generation rate is exceptionally high for a short period of time, such as during large batch jobs, data loads, data pump operations, create table as select, PDML operations, or end of month, quarter, or year batch updates.

Obtain the redo generation history from the primary database and compare that to when the redo transport or redo apply lag started. 


Check if the redo generation rate is exceptionally high because of additional workloads, such as adding new pluggable databases or new application services. 

By doing so, additional tuning may be required to accommodate this additional load. 


As part of troubleshooting, gather the following information or address the following questions:


  • Gather daily history of primary database's redo generation rate using this query.


SQL> select trunc(completion_time) as "DATE", count(*) as "LOG SWITCHES", round(sum(blocks*block_size)/1024/1024) as "REDO PER DAY (MB)" 

from v$archived_log 

where dest_id=1 

group by trunc(completion_time) order by 1;


  • Gather per log redo generation rate starting 6 hours prior to start any redo or transport lag.


SQL> alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';


SQL> select thread#,sequence#,blocks*block_size/1024/1024 MB,(next_time-first_time)*86400 sec, blocks*block_size/1024/1024)/((next_time-first_time)*86400) "MB/s" from v$archived_log 

where ((next_time-first_time)*86400<>0) 

and first_time between to_date('2025/01/15 08:00:00','YYYY/MM/DD HH24:MI:SS') 

and to_date('2025/01/15 11:00:00','YYYY/MM/DD HH24:MI:SS') 

and dest_id=1 order by first_time;


▶️Gather hourly snapshots of the redo generation rate from the Automatic Workload Repository (AWR) report 6 hours before the start of any redo or transport lag. 
By default, Oracle Database automatically generates snapshots once every hour; however, you may want to manually create snapshots to capture statistics at times different from those of the automatically generated snapshots. To view information about an existing 
snapshot, use the DBA_HIST_SNAPSHOT view. 
You can find information about Snapshot at the end of this post. 


▶️Is this primary redo generation rate exceptionally high compared to prior history?


▶️If possible, determine the workload that corresponds to the high redo generation rate and evaluate whether it's transient or if it can be tuned. 
For example, for large purge operations, consider truncate or drop partition operations to reduce the redo generation volumes.


Creating Snapshots

By default, Oracle Database automatically generates snapshots once every hour. However, you may want to manually create snapshots to capture statistics at times different from those of the automatically generated snapshots.

Creating Snapshots Using the Command-Line Interface

To manually create snapshots, use the CREATE_SNAPSHOT procedure. 

The following example shows a CREATE_SNAPSHOT procedure call.


BEGIN

DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

END;

/


In this example, a snapshot is created immediately on the local database instance. To view information about an existing snapshot, use the DBA_HIST_SNAPSHOT view.


Note:You can specify value for the flush_level parameter of the CREATE_SNAPSHOT procedure to either TYPICAL or ALL. 

The default value for the flush level is TYPICAL.


The flush level signifies the breadth and depth of the AWR statistics to be captured. If you want to capture all the AWR statistics, then set the flush level to ALL. If you want to skip few AWR statistics, such as, SQL statistics, segment statistics, and files and tablespace statistics for performance reasons, then set the flush level to TYPICAL.


Alireza Kamrani 

No comments:

Post a Comment

Tune and Troubleshoot Oracle Data Guard (Part 8 of 8)

📚What is recommendations Steps before Switchover Alireza Kamrani  04/21/2025 Tune Redo Apply by Evaluating Database Wait Events Once you ha...