Tune and Troubleshoot Oracle Data Guard (Part 3 of 8)
Alireza Kamrani
04/13/2025
Evaluate the Transport Network and Tune
Redo transport consists of the primary database instance background process sending redo to the standby database background process. You can evaluate if the network is optimized for Oracle Data Guard redo transport.
If asynchronous redo transport is configured, redo data is streamed to the standby in large packets asynchronously. To tune asynchronous redo transport over the network, you need to optimize a single process network transfer.
If synchronous redo transport is configured, each redo write must be acknowledged by the primary and standby databases before proceeding to the next redo write. You can optimize standby synchronous transport by using the FASTSYNC attribute as part of the LOG_ARCHIVE_DEST setting, but higher network latency (for example > 5 ms) impacts overall redo transport throughput.
Before you continue, see Assessing and Optimizing Network Performance (See Previous parts(Part 1)) first to:
- Assess whether you have sufficient network bandwidth to support the primary's redo generation rate
- Determine optimal TCP socket buffer sizes to tune redo transport
- Tune operating system limits on socket buffer sizes to tune redo transport
- Determine optimal MTU setting for redo write size
- Tune MTU to increase network throughput for redo transport
If network configuration is tuned, evaluate if the transport lag (refer to Verify Transport Lag and Understand Redo Transport Configuration(See Part 1)) is reducing to acceptable levels.
If that's the case, you have met your goals and you can stop. Otherwise continue with the rest of the rest of tuning and troubleshooting section.
Gather and Monitor System Resources
Gather Oracle Linux OSwatcher or Oracle Exadata Exawatcher data to analyze system resources.
OSWatcher (oswbb) is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues.
As a best practice, you should install and run OSWatcher on every node that has a running Oracle instance. In the case of a performance issue, Oracle support can use this data to help diagnose performance problems which may outside the database.
You can download OSWatcher from OSWatcher (Doc ID 301137.1).
ExaWatcher is a utility that collects performance data on the storage servers and database servers on an Exadata system.
The data collected includes operating system statistics, such as iostat, cell statistics (cellsrvstat), and network statistics.
Tune to Meet Data Guard Resource Requirements
Redo transport can be impacted if:
- Primary or standby database is completely CPU bound
- Primary or standby database I/O system is saturated
- Network topology can't support the redo generation rates
Evaluate whether the primary database system has:
- Sufficient CPU utilization for Log Writer Process (LGWR) to post foregrounds efficiently
- Sufficient I/O bandwidth so local log writes maintain low I/O latency during peak rates
- Network interfaces that can handle peak redo rate volumes combined with any other network activity across the same interface
- Automatic Workload Repository (AWR), Active Session History (ASH), and OSwatcher or Exawatcher data gathered from the primary database for tuning and troubleshooting
Evaluate whether the standby database system has:
- Sufficient CPU utilization for the remote file server (RFS), the Oracle Data Guard process that receives redo at the standby database, to efficiently write to standby redo logs
- Sufficient I/O bandwidth to enable local log writes to maintain low I/O latency during peak rates
- A network interface that can receive the peak redo rate volumes combined with any other network activity across the same interface
- AWR, ASH, and OSwatcher or Exawatcher data gathered from the standby database for tuning and troubleshooting.
Note:The top issue encountered with the standby database is poor standby log write latency because of insufficient I/O bandwidth. This problem can be mitigated by using Data Guard Fast Sync.
If system configuration is tuned and the above resource constraints are removed, evaluate if the transport lag (refer to Verify Transport Lag and Understand Redo Transport Configuration (See Previous Parts) is reducing to acceptable levels. If that's the case, you have met your goals.
Alireza Kamrani
No comments:
Post a Comment