Sunday, June 29, 2025

Tuning and Troubleshooting Synchronous Redo Transport (Part 2)

Tuning and Troubleshooting Synchronous Redo Transport (Part 2)

Alireza Kamrani (06/29/2025)

 

 

Understanding What Causes Outliers

Any disruption to the I/O on the primary or standby databases, or spikes in network latency, can cause high log file sync outliers with synchronous redo transport. 

You can see this effect when the standby system's I/O subsytemis inferior to that of the primary system.

Often administrators host multiple databases such as dev and test on standby systems, which can impair I/O response. 

It is important to monitor I/O using iostat to determine if the disks reach maximum IOPS, because this affects the performance of SYNC writes.

Frequent log switches are significant cause of outliers. Consider what occurs on the standby when a log switch on the primary occurs, as follows.

1. Remote file server (RFS) process on the standby must finish updates to the standby redo log header.
2. RFS then switches into a new standby redo log with additional header updates.
3. Switching logs forces a full checkpoint on the standby.

This causes all dirty buffers in the buffer cache to be written to disk, causing a spike in write I/O.

In a non-symmetric configuration where the standby storage subsystem does not have the same performance as the primary database, this results in higher I/O latency.

4. The previous standby redo log must be archived, increasing both read and write I/O.

Effects of Synchronous Redo Transport Remote Writes

When you enable synchronous redo transport (SYNC), you introduce a remote write (remote file server (RFS) write to a standby redo log) in addition to the normal local write for commit processing.

This remote write, depending on network latency and remote I/O bandwidth, can make commit processing time increase. 

Because commit processing takes longer, you observe more sessions waiting on the Log Writer Process (LGWR) to finish its work and begin work on the commit request, that is, application concurrency has increased. 

You can observe increased application concurrency by analyzing database statistics and wait events.

Consider the example in the following table.

Affect of Sync Transport Increasing Application Concurrency 


SYNC

Redo Rate

Network Latency

TPS from AWR

log file sync average (ms)

log file parallel write average (ms)

RFS random I/O

SYNC remote write average (ms)

Redo write size (KB)

Redo writes

Defer

25MB

0

5,514.94

0.74

0.47

NA

NA

10.58

2,246,356

Yes

25MB

0

5,280.20

2.6

.51

.65

.95

20.50

989,791

Impact

0

-

-4%

+251%

+8.5%

NA

NA

+93.8%

-55.9%

 

In the above example, enabling SYNC reduced the number of redo writes, but increased the size of each redo write. 

Because the size of the redo write increased, you can expect the time spent doing the I/O (both local and remote) to increase. 

The log file sync wait time is higher because there is more work per wait.

However, at the application level, the impact on the transaction rate or the transaction response time might change very little as more sessions are serviced for each commit. 

This is why it is important to measure the impact of SYNC at the application level, and not depend entirely on database wait events.

It is also a perfect example of why log file sync wait event is a misleading indicator of the actual impact SYNC has on the application.


Example of Synchronous Redo Transport Performance Troubleshooting

To look at synchronous redo transport performance, calculate the time spent for local redo writes latency, average redo write size for each write, and overall redo write latency, as shown here.

Use the following wait events to do the calculations.

• local redo write latency = 'log file parallel write'
• remote write latency = ‘SYNC remote write’
• average redo write size per write = ‘redo size’ / ‘redo writes’
• average commit latency seen by foregrounds = 'log file sync'

Statistics from an Automatic Work Repository (AWR) report on an Oracle database are provided in the following table. 

Synchronous redo transport (SYNC) was enabled to a local standby with a 1ms network latency to compare the performance impact to a baseline with SYNC disabled.

 

Assessing Synchronous Redo Transport Performance with Oracle Database

Metric

Baseline (No SYNC)

SYNC

Impact

redo rate (MB/s)

25

25

no change

log file sync

0.68

4.60

+576%

log file parallel write average (ms)

0.57

0.62

+8.8%

TPS

7,814.92

6224.03

-20.3%

RFS random I/O

NA

2.89

NA

SYNC remote write average (ms)

NA

3.45

NA

redo writes

2,312,366

897,751

-61,2%

redo write size (KB)

10.58

20.50

+93.8%

 

In the above example observe that log file sync waits averages increased dramatically after enabling SYNC. 

While the local writes remained fairly constant, the biggest factor in increasing log file sync was the addition of the SYNC remote write. 

Of the SYNC remote write the network latency is zero, so focusing on the remote write into the standby redo log shows an average time of 2.89ms. 

This is an immediate red flag given that the primary and standby were using the same hardware, and the SYNC remote write average time should be similar to the primary's log file parallel write average time.

In the above example, the standby redo logs have multiple members, and they are placed in a slower performing disk group. 

After reducing the standby redo logs to a single member, and placing them in a fast disk group, you can see results such as those shown in the following table.

 

SYNC Performance After Reducing Standby Redo Logs to a Single Member and Placing on a Fast Disk Group


Metric

Baseline (No SYNC)

SYNC

Impact

redo rate (MB/s)

25

25

no change

log file sync

0.67

1.60

+139%

log file parallel write

0.51

0.63

+23.5%

TPS

7714.36

7458.08

-3.3%

RFS random I/O

NA

.89

NA

SYNC remote write average (ms)

NA

1.45

NA

redo writes

2,364,388

996,532

-57.9%

redo write size (KB)

10.61

20.32

+91.5%

 

No comments:

Post a Comment

Tuning and Troubleshooting Synchronous Redo Transport (Part 2)

Tuning and Troubleshooting Synchronous Redo Transport (Part  2 ) Alireza Kamrani (06 /29/ 2025)     Understanding What Causes Outliers Any d...