Database-Box: Tune and Troubleshoot Oracle Data Guard (Part 5 of 8)

Alireza Kamrani

04/16/2025

Advanced Troubleshooting: Determining Network Time with Asynchronous Redo Transport

Given enough resources, especially network bandwidth, asynchronous redo transport can maintain pace with very high workloads. In cases where resources are constrained, asynchronous redo transport can begin to fall behind resulting in a growing transport lag on the standby database.

Asynchronous redo transport (ASYNC) transmits redo data asynchronously with respect to transaction commitment. A transaction can commit without waiting for an acknowledgment that the redo generated by that transaction was successfully transmitted to a remote standby database.

With ASYNC, the primary database Log Writer Process (LGWR) continues to acknowledge commit success even if limited bandwidth prevents the redo of previous transactions from being sent to the standby database immediately (picture a sink filling with water faster than it can drain).

ASYNC uses a TT00 process to transmit redo directly from the log buffer of the primary database. If the TT00 process is unable to keep pace, and the log buffer is recycled before the redo can be transmitted to the standby database, then the TT00 process automatically transitions to reading and sending from the online redo log file (ORL) on disk.

Once TT00 transmission has caught up with current redo generation, it automatically transitions back to reading and sending directly from the log buffer.

In cases in which there are two or more log switches before the TT00 has completed sending the original ORL, the TT00 will still transition back to reading the contents of the current online log file. Any ORLs that were archived in between the original ORL and the current ORL are automatically transmitted using Oracle Data Guard’s redo gap resolution process.

Sufficient resources, such as network bandwidth, CPU, memory, and log file I/O on both the primary and standby databases are critical to the performance of an asynchronous Data Guard configuration.

To determine which resource is constraining asynchronous transport, use krsb stats which can be enabled by setting event 16421 on both the primary and standby databases:

alter session set events ‘16421 trace name context forever, level 3’;

This event is very lightweight and won't affect performance of the primary or standby database.

This dynamic event should be set on all primary and standby instances, and it will write statistics into the TT00 or remote file server (RFS) trace file when shipping for a given sequence has completed. Looking in the trace file, you will see the krsb_end stats at the beginning and end of the file. The stats at the end of the file will provide insight into where asynchronous shipping was spending time. For example:

krsb_end: Begin stats dump for T-1.S-593

max number of buffers in use 10

Operation elapsed time (micro seconds) 474051333

File transfer time (micro seconds) 474051326

Network Statistics

LOG_ARCHIVE_DEST_2 : OCI REQUEST

Total count : OCI REQUEST 2748

Total time : OCI REQUEST 81374

Average time : OCI REQUEST 29

LOG_ARCHIVE_DEST_2 : NETWORK SEND

Total count : NETWORK SEND 2748

Total time : NETWORK SEND 286554724

Average time : NETWORK SEND 104277

Total data buffers queued 9644

Total data buffers completed 9644

Total bytes written 9885272064

Total bytes completed synchronously 9885272064

Average network send size (blocks) 7025

Average network send buffers 3.51

Average buffer turnaround time 240889

Throughput (MB/s) 19.89

Total network layer time 286636098

Percentage of time in network 60.47

Disk Statistics

Total count : DISK READ 11531

Total time : DISK READ 12335132

Average time : DISK READ 1069

Read-ahead blocks 14151680

Log buffer blocks 266915

Disk stall blocks 4888576

Total count : BUFFER RELEASE 9643

Total time : BUFFER RELEASE 7229

Average time : BUFFER RELEASE 0

Total disk layer time 12342361

Percentage of time in disk layer 2.60

Data Guard Processing Statistics

Total count : SLEEP 198

Total time : SLEEP 172351312

Average time : SLEEP 870461

Total DG layer time 175072867

Percentage of time in DG layer 36.93

Remote Server-Side Network Statistics

LOG_ARCHIVE_DEST_2 : NETWORK GET

Total count : NETWORK GET 8242

Total bytes : NETWORK GET 9885272064

Total time : NETWORK GET 453233790

Average time : NETWORK GET 54990

Total server-side network layer time 453233790

Percentage of time in network 95.61

Remote Server-Side Disk Statistics

LOG_ARCHIVE_DEST_2 : DISK WRITE

Total count : DISK WRITE 9644

Total time : DISK WRITE 8731303

Average time : DISK WRITE 905

LOG_ARCHIVE_DEST_2 : DISK NOSTALL REAP

Total count : DISK NOSTALL REAP 9644

Total time : DISK NOSTALL REAP 579066

Average time : DISK NOSTALL REAP 60

LOG_ARCHIVE_DEST_2 : BUFFER GET

Total count : BUFFER GET 9644

Total time : BUFFER GET 3607

Average time : BUFFER GET 0

Total server-side disk layer time 9313976

Percentage of time in disk layer 1.96

Remote Server-Side Data Guard Processing Statistics

LOG_ARCHIVE_DEST_2 : PUBLISH RTA BOUNDARY

Total count : PUBLISH RTA BOUNDARY 8948

Total time : PUBLISH RTA BOUNDARY 3665841

Average time : PUBLISH RTA BOUNDARY 409

LOG_ARCHIVE_DEST_2 : VALIDATE BUFFER

Total count : VALIDATE BUFFER 9644

Total time : VALIDATE BUFFER 1403088

Average time : VALIDATE BUFFER 145

Total Server-Side DG layer time 11503560

Percentage of time in DG layer 2.43

krsb_end: End stats dump

The above output comes from a test run where a transport lag is just beginning to occur.

You can observe a lag due to network congestion increase, and the time waiting on the network layer increases above 50%. If a transport lag is the result of either compression or encryption, the percentage of time spent in the Data Guard layer would become the majority.

To disable krsb stats set event 16421 to level 1:

alter session set events ‘16421 trace name context forever, level 1’;

Alireza Kamrani♠️

Database-Box

Wednesday, April 16, 2025

Tune and Troubleshoot Oracle Data Guard (Part 5 of 8)

No comments:

Post a Comment

The Great Architecture Debate: Monolith, Microservices, and the Future of Enterprise Software

Report Abuse

Labels