Wednesday, April 16, 2025

Tune and Troubleshoot Oracle Data Guard (Part 5 of 8)

Alireza Kamrani 

  04/16/2025


Advanced Troubleshooting: Determining Network Time with Asynchronous Redo Transport


Given enough resources, especially network bandwidth, asynchronous redo transport can maintain pace with very high workloads. In cases where resources are constrained, asynchronous redo transport can begin to fall behind resulting in a growing transport lag on the standby database.


Asynchronous redo transport (ASYNC) transmits redo data asynchronously with respect to transaction commitment. A transaction can commit without waiting for an acknowledgment that the redo generated by that transaction was successfully transmitted to a remote standby database. 


With ASYNC, the primary database Log Writer Process (LGWR) continues to acknowledge commit success even if limited bandwidth prevents the redo of previous transactions from being sent to the standby database immediately (picture a sink filling with water faster than it can drain).


ASYNC uses a TT00 process to transmit redo directly from the log buffer of the primary database. If the TT00 process is unable to keep pace, and the log buffer is recycled before the redo can be transmitted to the standby database, then the TT00 process automatically transitions to reading and sending from the online redo log file (ORL) on disk.


Once TT00 transmission has caught up with current redo generation, it automatically transitions back to reading and sending directly from the log buffer.

In cases in which there are two or more log switches before the TT00 has completed sending the original ORL, the TT00 will still transition back to reading the contents of the current online log file. Any ORLs that were archived in between the original ORL and the current ORL are automatically transmitted using Oracle Data Guard’s redo gap resolution process.


Sufficient resources, such as network bandwidth, CPU, memory, and log file I/O on both the primary and standby databases are critical to the performance of an asynchronous Data Guard configuration.

To determine which resource is constraining asynchronous transport, use krsb stats which can be enabled by setting event 16421 on both the primary and standby databases:


alter session set events ‘16421 trace name context forever, level 3’;


This event is very lightweight and won't affect performance of the primary or standby database. 

This dynamic event should be set on all primary and standby instances, and it will write statistics into the TT00 or remote file server (RFS) trace file when shipping for a given sequence has completed. Looking in the trace file, you will see the krsb_end stats at the beginning and end of the file. The stats at the end of the file will provide insight into where asynchronous shipping was spending time. For example:


krsb_end: Begin stats dump for T-1.S-593

  max number of buffers in use             10

  Operation elapsed time (micro seconds)   474051333

  File transfer time (micro seconds)       474051326

  Network Statistics

   LOG_ARCHIVE_DEST_2 : OCI REQUEST

    Total count  : OCI REQUEST             2748

    Total time   : OCI REQUEST             81374

    Average time : OCI REQUEST             29

   LOG_ARCHIVE_DEST_2 : NETWORK SEND

    Total count  : NETWORK SEND            2748

    Total time   : NETWORK SEND            286554724

    Average time : NETWORK SEND            104277

    Total data buffers queued              9644

    Total data buffers completed           9644

    Total bytes written                    9885272064

    Total bytes completed synchronously    9885272064

    Average network send size (blocks)     7025

    Average network send buffers           3.51

    Average buffer turnaround time         240889

    Throughput (MB/s)                      19.89

   Total network layer time                286636098

   Percentage of time in network           60.47

  Disk Statistics

    Total count  : DISK READ               11531

    Total time   : DISK READ               12335132

    Average time : DISK READ               1069

    Read-ahead blocks                      14151680

    Log buffer blocks                      266915

    Disk stall blocks                      4888576

    Total count  : BUFFER RELEASE          9643

    Total time   : BUFFER RELEASE          7229

    Average time : BUFFER RELEASE          0

   Total disk layer time                   12342361

   Percentage of time in disk layer        2.60

  Data Guard Processing Statistics

    Total count  : SLEEP                   198

    Total time   : SLEEP                   172351312

    Average time : SLEEP                   870461

   Total DG layer time                     175072867

   Percentage of time in DG layer          36.93

  Remote Server-Side Network Statistics

   LOG_ARCHIVE_DEST_2 : NETWORK GET

    Total count  : NETWORK GET             8242

    Total bytes  : NETWORK GET             9885272064

    Total time   : NETWORK GET             453233790

    Average time : NETWORK GET             54990

   Total server-side network layer time    453233790

   Percentage of time in network           95.61

  Remote Server-Side Disk Statistics

   LOG_ARCHIVE_DEST_2 : DISK WRITE

    Total count  : DISK WRITE              9644

    Total time   : DISK WRITE              8731303

    Average time : DISK WRITE              905

   LOG_ARCHIVE_DEST_2 : DISK NOSTALL REAP

    Total count  : DISK NOSTALL REAP       9644

    Total time   : DISK NOSTALL REAP       579066

    Average time : DISK NOSTALL REAP       60

   LOG_ARCHIVE_DEST_2 : BUFFER GET

    Total count  : BUFFER GET              9644

    Total time   : BUFFER GET              3607

    Average time : BUFFER GET              0

   Total server-side disk layer time       9313976

   Percentage of time in disk layer        1.96

  Remote Server-Side Data Guard Processing Statistics

   LOG_ARCHIVE_DEST_2 : PUBLISH RTA BOUNDARY

    Total count  : PUBLISH RTA BOUNDARY    8948

    Total time   : PUBLISH RTA BOUNDARY    3665841

    Average time : PUBLISH RTA BOUNDARY    409

   LOG_ARCHIVE_DEST_2 : VALIDATE BUFFER

    Total count  : VALIDATE BUFFER         9644

    Total time   : VALIDATE BUFFER         1403088

    Average time : VALIDATE BUFFER         145

   Total Server-Side DG layer time         11503560

   Percentage of time in DG layer          2.43

krsb_end: End stats dump


The above output comes from a test run where a transport lag is just beginning to occur. 

You can observe a lag due to network congestion increase, and the time waiting on the network layer increases above 50%. If a transport lag is the result of either compression or encryption, the percentage of time spent in the Data Guard layer would become the majority.


To disable krsb stats set event 16421 to level 1: 


alter session set events ‘16421 trace name context forever, level 1’;


Alireza Kamrani♠️

No comments:

Post a Comment

Tuning and Troubleshooting Synchronous Redo Transport (Part 2)

Tuning and Troubleshooting Synchronous Redo Transport (Part  2 ) Alireza Kamrani (06 /29/ 2025)     Understanding What Causes Outliers Any d...