How To Configure Client Failover for Data Guard Connections
Using Database Services FAN vs TAF vs TAC vs AC 07/07/2025 Written by: ALIREZA KAMRANI |
Introduction:
Oracle RAC provides scalability and high availability for the
Oracle Database. If one server (RAC node) fails or is taken offline for
maintenance, the database is still accessible through the additional nodes.
However, what happens to client sessions that are executing some work, whether
reading or changing data, when maintenance begins? That work will be
interrupted and need to be executed again by the end-user or the application
unless you implement draining and enable Application Continuity or Transparent
Application Continuity.
Oracle RAC provides high availability for the Oracle database.
From the application perspective, Fast Application Notification (FAN) allows
relocating sessions to the running node, Draining enables active sessions to
finish their requests within a predefined drain timeout, and Application
Continuity replays interrupted requests for the sessions that did not drain
(=finish executing their requests). All this is done transparently to the
end-user and applications.
Application Continuity (AC) is an Oracle Database feature
that enables the seamless and rapid replay of an in-flight request against the
database following a recoverable error that makes the database session
unusable. Its primary goal is to ensure that the interruption appears to
the end-user as nothing more than a delay in request processing. AC works
by completely reconstructing the database session after an outage, including
all states, cursors, variables, and the last transaction (if any). This
effectively masks disruptions caused by planned maintenance (e.g., patching,
configuration changes) or unplanned outages (e.g., network errors, instance
failures).
Transparent Application Continuity (TAC), introduced with Oracle
Database 18c, is an extension or mode of AC. TAC transparently tracks and
records session and transactional state, enabling the recovery of a database
session after recoverable outages. The key characteristic of TAC is its
ability to operate without requiring any application code changes or specific
knowledge of the application by the database administrator (DBA). This
transparency is achieved through a state-tracking infrastructure that categorizes
session state usage.
Both AC and TAC can be used with Oracle Real Application Clusters
(RAC), Oracle RAC One Node, Oracle Active Data Guard, and Oracle Autonomous
Database (both shared and dedicated infrastructure). These features
enhance the fault tolerance of systems and applications by masking database
outages and recovering in-flight work that would otherwise be lost.
Without AC/TAC, database outages cause significant problems for
applications. Applications receive error messages, users are left uncertain
about the status of their transactions (e.g., money transfers, flight
reservations, orders), and middleware servers might even need restarting to
handle the surge of login requests post-outage. This leads to both
end-user dissatisfaction and operational inefficiency.
AC and TAC enable the Oracle Database, Oracle drivers, and Oracle
connection pools to collaborate, safely and reliably masking many planned and
unplanned outages. By automatically handling recoverable errors, they
improve the end-user experience and reduce the need for application developers
to write complex error-handling code. This boosts developer productivity
and aims for uninterrupted application operation.
The evolution from Oracle’s basic failover mechanisms (like TAF –
Transparent Application Failover) to AC and then TAC reflects a strategic shift
towards making high availability increasingly transparent and reducing
application-specific coding dependencies. TAF (pre-12c) had significant
limitations, especially around DML operations and session state
management. AC (12c) addressed DML replay but required awareness of
connection pool usage and request boundaries. TAC (18c+) further reduced
complexity by automating state tracking and boundary detection. This
progression shows Oracle recognized the adoption barriers of earlier solutions
and prioritized ease of use alongside capability. Consequently, TAC has become
Oracle’s preferred solution for modern applications, especially in cloud and
Autonomous Database environments , while AC remains relevant for specific
legacy systems or customization needs.
AC and TAC extend Oracle’s MAA principles to the
application tier. MAA is a set of best practices, configurations, and
architectural blueprints designed to achieve zero data loss and zero
application downtime goals. AC and TAC contribute to these goals by
recovering in-flight transactions and the application stack.
These features work in conjunction with other Oracle HA solutions
like RAC, Data Guard, and Fast Application Notification (FAN) to form the
building blocks for continuous availability. The MAA framework aims to
keep applications continuously available by hiding planned and unplanned
events, as well as load imbalances at the database tier. AC and TAC are
integral parts of this architecture, minimizing the impact of database outages
on the application.
The Replay Process: How AC/TAC maintains and Recovers Sessions
The working mechanism of AC and TAC involves the following steps
when a recoverable error is detected:
- Error
Detection: The system identifies a recoverable error (e.g., network
interruption, temporary instance failure) that renders the session
unusable.
- New
Session Establishment: A new database session is established on
another available database instance.
- Session
State Restoration: The state of the original session before the
interruption (non-transactional state, variables, PL/SQL package states,
etc.) is reconstructed in the new session. This is managed through service
parameters like FAILOVER_RESTORE and SESSION_STATE_CONSISTENCY,
and mechanisms like Database Templates in 23ai.
- Replay
of Database Calls: The database calls (SQL queries, DML operations)
made from the beginning of the interrupted request are executed
sequentially in the new session.
- Consistency
Check and Idempotence: During replay, data consistency is checked.
The Transaction Guard mechanism ensures that the transaction is committed
only once (idempotence), especially if the interruption occurred during
the COMMIT operation.
- Continuation
or Error: If the replay is successful, the application perceives the
interruption merely as a delay and continues from where it left off.
However, if data inconsistency is detected during replay (e.g., a replayed
query returns different results) or an unrecoverable state is encountered,
the replay is rejected, and the application receives the original error.
Unrecoverable errors (e.g., invalid data input) are never replayed.
This process ensures that the user is unaffected by the
interruption and the transaction is either completed safely or the original
error state is accurately reported.
Transactional State and Idempotence
AC and TAC aim to preserve the integrity of the last transaction
during the replay of an interrupted request. This becomes critical,
especially when an interruption occurs after the COMMIT command is
sent but before the acknowledgment is received. This is where Transaction
Guard (TG) comes into play.
Transaction Guard (TG) determines the definitive outcome (COMMIT_OUTCOME)
of the transaction, preventing the same transaction from being committed
multiple times during replay.
AC and TAC rely on this idempotence guarantee provided by TG to
perform the replay safely.
How To Configure Client Failover for Data Guard Connections Using
Database Services
CONCLUSION
The Oracle Autonomous Database is configured and managed for high
availability on your behalf. No additional configuration or management is
required by you.
There are a few simple steps to achieving Continuous Availability
for your applications:
• Select the ATP-D service that is appropriate for your SLA’s
• Configure Fast Application Notification (FAN)
• Use the recommended connection string for your applications
• Use application best practices to optimize for draining
• Use Transparent Application Continuity or Application Continuity
for continuous service
By following these five simple steps, planned maintenance
activities will no longer require outages and unplanned events will rarely
result in failed transactions and interruptions to service.
More info:
https://www.oracle.com/docs/tech/database/continuous-service-for-apps-on-atpd.pdf
No comments:
Post a Comment