Sunday, May 10, 2026

Real-time query on standby server requirements and musts - Part2


Real-time query on standby server requirements and musts 

--Part 2

In the part1, I described some solutions to ensuring Oracle standby database is synched with primary for critical reporting services. I showing that STANDBY_MAX_DATA_DELAY parameter how helps to control GAP, and review 'alter session sync with primary' affection to force a synchronization between standby and primary database.

One of another techniques is creating a specific service using tuning options:

Define your read service on standby with:

srvctl add service -d <DB_UNIQUE_NAME> -s ro_service \

  -role PHYSICAL_STANDBY \

  -preferred <standby_instance> \

  -pdb <your_pdb> \

  -commit_outcome TRUE \

  -failovertype SELECT \

  -replay_init_time 1800 \

  -rlbgoal SERVICE_TIME \

  -clbgoal SHORT \

  -maxlag 5

 

Explanation

  • -role PHYSICAL_STANDBY → ensures this service runs only on the standby (Active Data Guard).
  • -rlbgoal SERVICE_TIME → enables runtime load balancing based on actual response times.
  • -clbgoal SHORT → optimizes for short-lived connections, such as web requests.
  • -maxlag 5 → enforces a maximum 5-second apply lag; if the standby falls behind, the service is automatically taken offline.
    This parameter behaves similarly to STANDBY_MAX_DATA_DELAY=5, but at the service level instead of the instance level.
  • -commit_outcome TRUE and -failovertype TRANSACTION → enable Application Continuity for reliable transaction replay.
  • -replay_init_time 1800 → sets 30-minute window for session replay initialization.

 

 

 

 

 

Parameter

Meaning

failovertype TRANSACTION

During a failover (or replay), in-flight transactions are fully replayed. Ensures transactional integrity. Common for read-write / primary services.

failovertype SELECT

During a failover, only SELECT statements are replayed. No transactional replay. Common for read-only / standby services.

 

-clbgoal Overview

The -clbgoal parameter in srvctl add service controls how Oracle distributes client connections at connect time across available instances.

  • CLB = Connect-Time Load Balancing
  • Affects which instance a new client session is sent to when multiple instances are available for a service.
  • Typical values: SHORT or LONG

 

🔹 Value Definitions

Value

Meaning

When to Use

SHORT

Distribute connections to the instance with the fewest active sessions or shortest estimated workload. Optimized for short-lived connections (OLTP queries, quick selects, small jobs).

Web apps, dashboards, and services with many small queries.

LONG

Distribute connections considering long-running queries and current service time. Optimized for heavy, long-duration queries to avoid overloading an instance.

BI / reporting queries, large analytics queries on ADG standby.

 

🔹 How They Behave on Standby

a) TRANSACTION

  • Guarantees full transactional continuity if a session fails over from one node to another or from standby to primary.
  • All in-flight statements (including DML) are replayed.
  • Introduces slightly more overhead on session initialization and replay logic.
  • Mostly used for OLTP / primary read-write workloads with Application Continuity.

b) SELECT

  • Only read queries (SELECTs) are replayed if a failover happens.
  • No DML replay (no transactions).
  • Minimal overhead, optimized for read-only reporting.
  • Perfect for ADG standby read services where clients do not issue DML.

-commit_outcome : Enable Transaction Guard; when set to TRUE, the commit outcome for a transaction is accessible after the transaction's session fails due to a recoverable outage.

-replay_init_time:

For Application Continuity, this parameter specifies the difference between the time, in seconds, of original processing of the first operation of a request and the time that the replay is ready to start after a successful reconnect. Application Continuity will not replay after the specified amount of time has passed. This parameter is intended to avoid the unintentional processing of a transaction when a system is recovered after a long period. The default is 5 minutes (300). The maximum value is 24 hours (86400). If the -failover_type parameter is not set to TRANSACTION, then you cannot use this parameter.

 

-rlbgoal {NONE | SMART_CONN | SERVICE_TIME | THROUGHPUT}         

Runtime Load Balancing Goal (for the Load Balancing Advisory).

Set this parameter to SMART_CONN to enable Smart Connection Rebalance.

Set this parameter to SERVICE_TIME to balance connections by response time.

Set this parameter to THROUGHPUT to balance connections by throughput.

 

-maxlag 5 here works similar to STANDBY_MAX_DATA_DELAY, controlling read service availability based on lag.

 

-maxlag(maximum_lag_time ):

The default value is ANY. You must also specify the -global option.

Maximum replication lag time in seconds for a global service.

Must be a non-negative integer.

 

What is Session Draining

Session draining is a feature in Oracle that allows existing client sessions to complete gracefully before a service is stopped, relocated, or taken offline.

  • Purpose: Avoid killing active user sessions or long-running queries when:
    • Shutting down a service
    • Moving a service to another node
    • Performing maintenance on an instance

 

  • How it works:

1.                   Oracle marks the service or instance as unavailable for new connections.

2.                   Existing sessions continue running until:

      • They finish naturally, or
      • A timeout expires (-drain_timeout), after which sessions are forcibly disconnected.

This ensures zero disruption for users while allowing controlled maintenance.

 

 

When is Session Draining Used?

  • RAC (Real Application Clusters):
    • Moving a service between nodes (srvctl relocate service)
    • Stopping a node or instance (srvctl stop instance)
  • Active Data Guard:
    • Switching standby services or taking them offline
  • General maintenance:
    • Applying patches, restarting nodes, or scaling resources

 

3️ Key SRVCTL Options for Session Draining

Option

Description

Default / Notes

-drain_timeout <seconds>

Maximum time Oracle will wait for existing sessions to complete before forcefully terminating them.

Default = 300 seconds (5 min). Can be adjusted based on workload.

-drain

Tells Oracle to drain sessions instead of immediately stopping the service. Often used with srvctl stop service or relocate service.

Must be combined with -drain_timeout.

-graceful

Similar purpose; ensures in-flight sessions complete naturally (sometimes used in older SRVCTL versions).

Less commonly used in 12c+; -drain is preferred.

-force

Immediately kills sessions without waiting. Overrides session draining.

Use only if you need immediate shutdown.

 

Example Scenarios

a) Draining sessions before stopping a service

srvctl stop service -d ORCL -s app_ro_svc -drain -drain_timeout 600

  • Marks app_ro_svc as unavailable for new connections.
  • Existing sessions wait up to 600 seconds to complete.
  • After timeout, any remaining sessions are disconnected.

 

 

b) Relocating a service with session draining

srvctl relocate service -d ORCL -s app_rw_svc -oldinst orcl1 -newinst orcl2 -drain -drain_timeout 300

  • Stops routing new connections to orcl1.
  • Existing sessions finish up to 5 minutes.
  • After 300 seconds, remaining sessions are moved or terminated on orcl1.

 

c) Forcing a service stop (skip draining)

srvctl stop service -d ORCL -s app_ro_svc -force

  • Immediately terminates all sessions.
  • Useful for emergency shutdowns, but disrupts clients.

 

Best Practices

  1. Use session draining for long-running queries or critical workloads to avoid breaking users.
  2. Set -drain_timeout based on expected query duration:
    • OLTP: 30–60 seconds usually enough
    • Analytics / reporting: 300–900 seconds depending on query length
  3. Combine with service relocation for zero-downtime maintenance.
  4. Monitor sessions via:

SELECT sid, serial#, username, status FROM v$session WHERE service_name='APP_RO_SVC';

  1. For Active Data Guard reporting services, session draining is especially useful when you need to:
    • Take standby service offline
    • Apply patches or upgrade without disrupting analytics users

 

 Summary

  • Session draining = letting active sessions finish gracefully before stopping or relocating a service.
  • Key options:
    • -drain → enable session draining
    • -drain_timeout <seconds> → maximum wait for sessions to finish
    • -force → bypass draining, kill sessions immediately
  • Use draining for long-running queries, maintenance, or zero-downtime service moves.
  • Particularly important for ADG standby services and long-running reporting workloads.

 

 

In the last part (Part3) I continue another’s techniques.

 

 

 

*************** Alireza Kamrani ****************

 

 

 

 

 

Real-time query on standby server requirements and musts - Part1

Introduction:

In many Oracle environments, the physical standby database is treated as a purely passive DR asset, something you hope to never touch until a disaster happens. But modern SLAs, cost pressures, and read scalability needs are pushing teams to run real-time queries directly on standby. 

 

This sounds simple (“just open it read-only”), but in practice it raises tough questions:

 

Ø  How do we guarantee the standby is truly in sync with primary?   

Ø  What configuration parameters are mandatory (Redo Transport, Real-Time Apply, `SYNC` vs `ASYNC`, protection modes, etc.)? 

Ø  What does the right architecture look like to balance performance, latency, and consistency? 

Ø  How do we detect and prevent hidden lags, apply lag, transport lag, network issues, that can quietly invalidate “real-time” assumptions? 

Ø  What operational controls and monitoring are required so DBAs can confidently let critical reports and near real-time queries run on standby?

 

This post walks through the requirements and must-haves for safe real-time query on a standby:

 

Ø  Core architecture patterns for primary–standby setups supporting real-time query 

Ø  Key Oracle Data Guard parameters and settings that influence sync behavior 

Ø  Recommended modes, transport and apply options to minimize lag 

Ø  Practical checks, scripts, and metrics to prove the standby is really current 

 

By the end, you should have a concrete checklist of configuration, monitoring, and architectural decisions that let you treat your standby as a trustworthy, near-real-time read platform, without compromising its primary role as your last line of defense.

 

DBAs have some techniques to prepare Physical Standby server to offloading reporting, based on reporting importance we need main configuration on design, parameters, protection mode, redo apply mechanism to guarantee a real-time query processing with Zero-Gap.

Regardless of architecture and design, I review main parameters to control a user in SESSION level and internal of Data guard to achieve an exactly zero-gap and real-time query processing. This hep DBA to decrease Primary workload and also, we can use ADG for backup purpose.

 

We have some technique to prepare Standby for reporting service:

Ø  Use ADG_REDIRECT_DML to enable or disable automatic redirection of DML operations from a standby to the primary in Oracle Active Data Guard environments.

Ø  Use Active Data Guard with preferred Protection mode (based on business requirements)

Ø  Use Real-Time Apply and NODELAY Option in Media Recovery processing

But in this topic, I will demonstrate some new Options to ensuring a near 100% real-time processing for critical and financial reporting, because none of the above options guarantee that the data guard will be in sync, especially during peak hours.

Our requirement is to receive the records from the tables exactly as they are from the primary.

To what extent can this need be fulfilled?

 

Configuring Apply Lag Tolerance in a Real-time Query Environment

 

The STANDBY_MAX_DATA_DELAY session parameter can be used to specify a session-specific apply lag tolerance, measured in seconds, for queries issued by non-administrative users to a physical standby database that is in real-time query mode.

This capability allows queries to be safely offloaded from the primary database to a physical standby database, because it is possible to detect if the standby database has become unacceptably stale.

If STANDBY_MAX_DATA_DELAY is set to the default value of NONE, then queries issued to a physical standby database are executed regardless of the apply lag on that database.

If STANDBY_MAX_DATA_DELAY is set to a nonzero value, then queries issued to a physical standby database are executed only if the apply lag is less than or equal to STANDBY_MAX_DATA_DELAY. Otherwise, an ORA-3172 error is returned to alert the client that the apply lag is too large.

If STANDBY_MAX_DATA_DELAY is set to 0, a query issued to a physical standby database is guaranteed to return the exact same result as if the query were issued on the primary database, unless the standby database is lagging behind the primary database, in which case an ORA-3172 error is returned.

Use the ALTER SESSION SQL statement to set STANDBY_MAX_DATA_DELAY. For example:

SQL> ALTER SESSION SET STANDBY_MAX_DATA_DELAY=2

 

Forcing Redo Apply Synchronization in a Real-time Query Environment

To ensure that all redo data received from the primary database has been applied to a physical standby database, you can use a SQL ALTER SESSION statement.

Issue the following SQL statement:

SQL> ALTER SESSION SYNC WITH PRIMARY;

This statement blocks until all redo data received by the standby database at the time that this command is issued has been applied to the physical standby database. An ORA-3173 error is returned immediately, and synchronization does not occur, if the redo transport status at the standby database is not SYNCHRONIZED or if Redo Apply is not active.

To ensure that Redo Apply synchronization occurs in specific cases, use the SYS_CONTEXT('USERENV','DATABASE_ROLE') function to create a standby-only trigger (enabled on the primary but that only takes certain actions if it is running on a standby). For example, you could create the following trigger that would execute the ALTER SESSION SYNC WITH PRIMARY statement for a specific user connection at logon:

CREATE TRIGGER adg_logon_sync_trigger

 AFTER LOGON ON user.schema

  begin

    if (SYS_CONTEXT('USERENV', 'DATABASE_ROLE')  IN ('PHYSICAL STANDBY')) then

      execute immediate 'alter session sync with primary';

    end if;

  end;

----------------------------------------------------------

Ø  ALTER SESSION SYNC WITH PRIMARY;

Key Characteristics

Aspect

Behavior

Nature

Blocking

Guarantee

Standby is fully caught up

Failure

ORA-3173 if sync not possible

Scope

Session-specific

 

When ORA-3173 occurs:

Ø  Redo transport not SYNCHRONIZED

Ø  Redo Apply not running

Ø  Not using SYNC transport

Ø  Real-time Query Restrictions

 

 

Why Protection Mode Matters:

Both STANDBY_MAX_DATA_DELAY = 0 and SYNC WITH PRIMARY impose strict requirements on redo transport.

Supported Protection Modes

Protection Mode

Supported

Reason

Maximum Performance

 Not guaranteed

ASYNC transport

Maximum Availability

 Supported

SYNC + optional fallback

Maximum Protection

 Fully supported

SYNC + no data loss

 

Technical Dependency Matrix

Feature

Requires

STANDBY_MAX_DATA_DELAY = 0

SYNC transport

ALTER SESSION SYNC WITH PRIMARY

SYNC transport

Guaranteed identical results

Real-Time Apply

No ORA-3173

Transport status = SYNCHRONIZED

MAA compliance

Maximum Availability

 

Key MAA Controls:

Layer

Control

Redo Transport

SYNC (Max Availability)

Apply

Real-Time Apply

Query Control

STANDBY_MAX_DATA_DELAY

Hard Sync

ALTER SESSION SYNC WITH PRIMARY

Access Control

Logon triggers per user

Monitoring

Apply lag alerts

 

 

 

Choosing the Right Strategy: Decision Table

Reporting Criticality

Recommended Approach

Near-real-time dashboards

STANDBY_MAX_DATA_DELAY = 2–5

Financial / regulatory reports

SYNC WITH PRIMARY

Mixed workloads

Combination via logon trigger

Read-only analytics

No delay control

Zero data divergence

STANDBY_MAX_DATA_DELAY = 0

 

Key Takeaways

1.        Active Data Guard alone does not guarantee freshness

2.        STANDBY_MAX_DATA_DELAY = policy-based lag control

3.        ALTER SESSION SYNC WITH PRIMARY = hard consistency barrier

4.        Both features require SYNC redo transport

5.        Maximum Availability mode is mandatory for critical reporting

6.        Combining these features is a canonical Oracle MAA pattern

 

If we use Physical STBY for Critical reporting(finance) that need a query freshness guarantee, do we need these settings when using Real-Time Apply on a physical standby with a higher protection mode such as Maximum Protection?

1.       Yes — even in Maximum Protection mode, you still need
STANDBY_MAX_DATA_DELAY and/or ALTER SESSION SYNC WITH PRIMARY
if the standby is used for critical reporting.

Why?

Maximum Protection guarantees no data loss on commit
It does NOT guarantee that a query on an open standby sees the latest data at query time.

These are two different problem domains.

 

2. What Maximum Protection Actually Guarantees (and What It Doesn’t)

What Maximum Protection guarantees

Ø  LGWR SYNC redo transport

Ø  Commit on primary does not complete unless redo is written to standby SRL

Ø  Zero data loss under all failure scenarios

Ø  Redo transport status = SYNCHRONIZED

This is a transaction durability guarantee, not a query guarantee.

What Maximum Protection does NOT guarantee

Ø  That redo has been applied

Ø  That standby queries see the latest committed SCN

Ø  That a reporting session waits for apply

Ø  That standby query results are identical to primary at that instant


 Redo can be:

  • Shipped
  • Persisted
  • Not yet applied

This gap can exist even in Maximum Protection based on some situations!.

 

In Maximum Protection

Aspect

Status

Redo transport

Already SYNC

Apply lag possible

Yes

Query freshness control

Without this

Query freshness control

With this

It prevents silent stale reads
It converts freshness into a contract
It fails fast with ORA-3172 if violated

Role of ALTER SESSION SYNC WITH PRIMARY in Maximum Protection

What it adds:

ALTER SESSION SYNC WITH PRIMARY;

This does one critical thing:

It blocks the session until all received redo has been applied.

 

Why it is still needed

Even in Maximum Protection:

  • Redo arrival ≠ Redo applied
  • Reporting may start before apply catches up

This command:

  • Forces apply to catch up before query
  • Guarantees identical results to primary at that moment

Fast Sync Mode – What It Is and What It Is Not

What Fast Sync Actually Means?

Fast Sync is (SYNC NOAFFIRM) and enables all sessions waiting for a remote RFS write to proceed as soon as the write is submitted, not when the write completes. This offers the best performance since it reduces the overall SYNC remote write wait event.

It refers to:

  • LGWR SYNC + AFFIRM
  • Reduced commit latency using:
    • Network batching
    • Standby Redo Logs (SRL)
    • Optimized acknowledgment path

Fast Sync:

  • Improves commit performance
  • Does not change apply behavior
  • Does not change query consistency

 

 

 

 

 

Fast Sync Apply Synchronization

Feature

Fast Sync

Commit latency

Improved

Redo safety

Same

Apply lag

Unchanged

Query freshness

 Not guaranteed

 

So even with Fast Sync:

  • Standby may still lag in apply
  • Queries may still read stale data

 

Final MAA Guidance (Clear and Practical)

If your standby is used for critical reporting

Even in Maximum Protection, you SHOULD:

  1. Use Real-Time Apply
  2. Use STANDBY_MAX_DATA_DELAY
  3. Use ALTER SESSION SYNC WITH PRIMARY for strict use cases
  4. Enforce via logon triggers per reporting user
  5. Monitor apply lag continuously
  6. Use NODELAY Option for media Recovery

 

This list discusses restrictions related to real-time query mode.

  • The apply lag control and Redo Apply synchronization mechanisms described above require that the client be connected and issuing queries to a physical standby database that is in real-time query mode.
  • The following additional restrictions apply if STANDBY_MAX_DATA_DELAY is set to 0 or if the ALTER SESSION SYNC WITH PRIMARY SQL statement is used:
    • The standby database must receive redo data via the SYNC transport.
    • The redo transport status at the standby database must be SYNCHRONIZED and the primary database must be running in either maximum protection mode or maximum availability mode.
    • Real-time apply must be enabled.
  • Oracle Active Data Guard achieves high performance of real-time queries in an Oracle RAC environment through the use of cache fusion. This allows the Oracle Data Guard apply instance and queries to work out of cache and not be slowed down by disk I/O limitations.

 A consequence of this is that an unexpected failure of the apply instance leaves buffers in inconsistent states across all the open Oracle RAC instances. If the database is open on any other instances, one of the open instances performs Active Data Guard instance recovery to bring the database to a consistent state and all the open instances remain open.

 

Wait for Part2 & Part3 in next weeks

******************************************************************* 

Real-time query on standby server requirements and musts - Part2

Real-time query on standby server requirements and musts   --Part 2 In the part1, I described some solutions to ensuring Oracle standby da...