Monday, July 7, 2025

Configure Client Failover for Data Guard Connections Using Database Services (FAN|TAF|TAC|AC)

 

How To Configure Client Failover for Data Guard Connections Using Database Services

FAN vs TAF vs TAC vs AC

07/07/2025       Written by: ALIREZA KAMRANI

 

 

Introduction:

Oracle RAC provides scalability and high availability for the Oracle Database. If one server (RAC node) fails or is taken offline for maintenance, the database is still accessible through the additional nodes. However, what happens to client sessions that are executing some work, whether reading or changing data, when maintenance begins? That work will be interrupted and need to be executed again by the end-user or the application unless you implement draining and enable Application Continuity or Transparent Application Continuity.

 

Oracle RAC provides high availability for the Oracle database. From the application perspective, Fast Application Notification (FAN) allows relocating sessions to the running node, Draining enables active sessions to finish their requests within a predefined drain timeout, and Application Continuity replays interrupted requests for the sessions that did not drain (=finish executing their requests). All this is done transparently to the end-user and applications.

 

Application Continuity (AC) is an Oracle Database feature that enables the seamless and rapid replay of an in-flight request against the database following a recoverable error that makes the database session unusable. Its primary goal is to ensure that the interruption appears to the end-user as nothing more than a delay in request processing. AC works by completely reconstructing the database session after an outage, including all states, cursors, variables, and the last transaction (if any). This effectively masks disruptions caused by planned maintenance (e.g., patching, configuration changes) or unplanned outages (e.g., network errors, instance failures).  

Transparent Application Continuity (TAC), introduced with Oracle Database 18c, is an extension or mode of AC. TAC transparently tracks and records session and transactional state, enabling the recovery of a database session after recoverable outages. The key characteristic of TAC is its ability to operate without requiring any application code changes or specific knowledge of the application by the database administrator (DBA). This transparency is achieved through a state-tracking infrastructure that categorizes session state usage.  

Both AC and TAC can be used with Oracle Real Application Clusters (RAC), Oracle RAC One Node, Oracle Active Data Guard, and Oracle Autonomous Database (both shared and dedicated infrastructure). These features enhance the fault tolerance of systems and applications by masking database outages and recovering in-flight work that would otherwise be lost.  

 

Without AC/TAC, database outages cause significant problems for applications. Applications receive error messages, users are left uncertain about the status of their transactions (e.g., money transfers, flight reservations, orders), and middleware servers might even need restarting to handle the surge of login requests post-outage. This leads to both end-user dissatisfaction and operational inefficiency.  

AC and TAC enable the Oracle Database, Oracle drivers, and Oracle connection pools to collaborate, safely and reliably masking many planned and unplanned outages. By automatically handling recoverable errors, they improve the end-user experience and reduce the need for application developers to write complex error-handling code. This boosts developer productivity and aims for uninterrupted application operation.  

The evolution from Oracle’s basic failover mechanisms (like TAF – Transparent Application Failover) to AC and then TAC reflects a strategic shift towards making high availability increasingly transparent and reducing application-specific coding dependencies. TAF (pre-12c) had significant limitations, especially around DML operations and session state management. AC (12c) addressed DML replay but required awareness of connection pool usage and request boundaries. TAC (18c+) further reduced complexity by automating state tracking and boundary detection. This progression shows Oracle recognized the adoption barriers of earlier solutions and prioritized ease of use alongside capability. Consequently, TAC has become Oracle’s preferred solution for modern applications, especially in cloud and Autonomous Database environments , while AC remains relevant for specific legacy systems or customization needs.  

 

AC and TAC extend Oracle’s MAA principles to the application tier. MAA is a set of best practices, configurations, and architectural blueprints designed to achieve zero data loss and zero application downtime goals. AC and TAC contribute to these goals by recovering in-flight transactions and the application stack.  

These features work in conjunction with other Oracle HA solutions like RAC, Data Guard, and Fast Application Notification (FAN) to form the building blocks for continuous availability. The MAA framework aims to keep applications continuously available by hiding planned and unplanned events, as well as load imbalances at the database tier. AC and TAC are integral parts of this architecture, minimizing the impact of database outages on the application.  

 

The Replay Process: How AC/TAC maintains and Recovers Sessions

The working mechanism of AC and TAC involves the following steps when a recoverable error is detected:

  1. Error Detection: The system identifies a recoverable error (e.g., network interruption, temporary instance failure) that renders the session unusable.  
  2. New Session Establishment: A new database session is established on another available database instance.  
  3. Session State Restoration: The state of the original session before the interruption (non-transactional state, variables, PL/SQL package states, etc.) is reconstructed in the new session. This is managed through service parameters like FAILOVER_RESTORE and SESSION_STATE_CONSISTENCY, and mechanisms like Database Templates in 23ai.  
  4. Replay of Database Calls: The database calls (SQL queries, DML operations) made from the beginning of the interrupted request are executed sequentially in the new session.  
  5. Consistency Check and Idempotence: During replay, data consistency is checked. The Transaction Guard mechanism ensures that the transaction is committed only once (idempotence), especially if the interruption occurred during the COMMIT operation.  
  6. Continuation or Error: If the replay is successful, the application perceives the interruption merely as a delay and continues from where it left off. However, if data inconsistency is detected during replay (e.g., a replayed query returns different results) or an unrecoverable state is encountered, the replay is rejected, and the application receives the original error. Unrecoverable errors (e.g., invalid data input) are never replayed.  

This process ensures that the user is unaffected by the interruption and the transaction is either completed safely or the original error state is accurately reported.

 

Transactional State and Idempotence

AC and TAC aim to preserve the integrity of the last transaction during the replay of an interrupted request. This becomes critical, especially when an interruption occurs after the COMMIT command is sent but before the acknowledgment is received. This is where Transaction Guard (TG) comes into play.  

Transaction Guard (TG) determines the definitive outcome (COMMIT_OUTCOME) of the transaction, preventing the same transaction from being committed multiple times during replay. 

AC and TAC rely on this idempotence guarantee provided by TG to perform the replay safely.

 

How To Configure Client Failover for Data Guard Connections Using Database Services

 

The best approach for hiding planned maintenance activities from your applications is to transparently drain work from each database workload location prior to the maintenance window for that workload location. Oracle’s connection pools and mid-tiers, including the WebLogic Server, Universal Connection Pool (UCP), OCI Session pool and ODP.NET Unmanaged Provider are Fast Application Notification (FAN) aware and therefore are notified before database services are scheduled to move to allow graceful draining of work before maintenance. FAN notification automatically triggers closing idle connections, opening new connections in the new service location, and allows a configurable time for active work to complete in the soon-to-be-shutdown service location. The major third-party JDBC mid-tiers, such as IBM WebSphere, allow for the same behavior when configured with UCP. For JDBC-based applications that cannot use UCP, Oracle provides solutions using Oracle Drivers and connection tests.

In order to hide unplanned outages resulting from a component or communication failure Oracle provides:

Notification - FAN is the first step to hiding outages. FAN notifies clients and breaks them out of their current network wait when an outage occurs. This avoids stalling applications for long network waits. Importantly, FAN also invokes rebalancing of sessions when services are available again.

Recovery – After the client is notified, Application Continuity (AC) or Transparent Application Continuity (TAC), re-establish a connection to a new workload location (which may be to the same or another instance in the Real Application Clusters (RAC) case, or a standby site in the Data Guard case) and replays in-flight (uncommitted) work when possible. By replaying in-flight work on the new location, the application can usually continue executing without knowing that any failure happened.

AC or TAC also executes during planned maintenance: for those sessions that do not drain (complete their current database operation) during the allocated drain interval.

In a Data Guard environment  primary database is open in read write mode and the standby database in read only mode for reporting purpose.

This document describes how to setup clients to connect to Data Guard databases (primary and standby) and configure automatic client failover such that in case there is role change due to switchover or failover,                                                                                      the client connections should still be valid i.e. the clients that need to connect to read only standby should always connect to the current standby irrespective of which database in the Data Guard configuration is                                                                                     currently in standby role and same for primary connections.

Oracle Application Continuity hides database interruptions from end-users and applications.

However, SQL*Plus is usually not your real application that will be used in production. Also, SQL*Plus is not a pooled application, and it’s recommended to use Connection Pools for Application Continuity.


This goal is achieved via database services.

CONNECT USING SERVICES

Services provide transparency for the underlying ATP-D infrastructure. FAN, connection data, Transparent Application Continuity (TAC), Application Continuity (AC), switchover, consumer groups and many other features and operations are predicated on the use of services. The services you use also define the primary or standby role in the underlying Data Guard environment.

Oracle’s Autonomous Database Transaction Processing Dedicated (ATP-D) offers five preconfigured services to choose from. All provide FAN and draining. TPURGENT and TP have TAC enabled by default in the ATP-D environment. An API is available to change the TAC or AC settings on all preconfigured services.

 

 

CONFIGURE CONNECTION STRING/URL FOR HIGH AVAILABILITY

Oracle recommends the Connection String/URL configuration shown below when connecting to the Oracle Autonomous Database. Connect strings embedded in the Oracle-supplied wallet are configured in this fashion. Do not use Easy Connect Naming on the client because EZCONNECT has no high-availability capabilities.

Note that the standby-scan specified below refers to the SCAN address available on the standby site specified in the Active Data Guard configuration. The driver attempts to connect to the primary site first, and if the service is not available, then attempts to connect to the service at the standby. Once the connection to the service is made, at whichever site, the Oracle driver version 12.2 and later remembers the TNS address list that offers that service and gives this site priority.

 

To help in choosing the service for batch work:

HIGH: Queries run with a Degree of Parallelism equal to CPU_COUNT. There is a limit of three concurrent queries after which statement queuing occurs.

MEDIUM: Queries run with a Degree of Parallelism of four. The maximum number of queries that can run simultaneously is (CPU_COUNT*1.25).

LOW: Queries run serially. Queueing starts when concurrent queries exceed (2*CPU_COUNT).

Use this Connection String for ALL Oracle drivers’ version 12.2 or higher:

Alias (or URL) =

(DESCRIPTION =

     (CONNECT_TIMEOUT= 90)(RETRY_COUNT=50)(RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3)

          (ADDRESS_LIST =

         (LOAD_BALANCE=on)

         (ADDRESS = (PROTOCOL = TCP)(HOST=primary-scan)(PORT=1521)))

         (ADDRESS_LIST =

        (LOAD_BALANCE=on)

       (ADDRESS = (PROTOCOL = TCP)(HOST=standby-scan)(PORT=1521)))

       (CONNECT_DATA=(SERVICE_NAME = ATP-D SERVICE)))

 

Use the following for JDBC connections using Oracle driver version 12.1 or earlier

Alias (or URL) =

(DESCRIPTION =

       (CONNECT_TIMEOUT= 15)(RETRY_COUNT=50)(RETRY_DELAY=3)

       (ADDRESS_LIST =

       (LOAD_BALANCE=on)

       (ADDRESS = (PROTOCOL = TCP)(HOST=primary-scan)(PORT=1521)))

       (ADDRESS_LIST =

       (LOAD_BALANCE=on)

       (ADDRESS = (PROTOCOL = TCP)(HOST=standby-scan)(PORT=1521)))

      (CONNECT_DATA=(SERVICE_NAME = ATP-D SERVICE)))

 

USE FAST APPLICATION NOTIFICATION

FAN provides immediate notification to an application in the event of an outage or resumption of service.

Without FAN, applications can hang on TCP/IP timeout following hardware and network failures, and omit to rebalance when resources resume.

All Oracle pools and all Oracle application servers use FAN. Third-party JAVA application servers can use UCP to enable FAN.

No application changes are required to use FAN. These are configuration changes only.

For continuous service during planned maintenance, use FAN with:

• Oracle pools or

• UCP with third-party JDBC application servers or

• The latest Oracle client drivers

For continuous service during unplanned outages, use FAN with

• Application Continuity or

• Transparent Application Continuity

 

USE RECOMMENDED APPLICATION PRACTICES TO ALLOW DRAINING

Best practice for application usage is to check out connections for the time that they are needed, and then check them back in to the pool when the current action is complete. This is important to achieve good performance, for the rebalancing of work at runtime, and during maintenance windows for draining the work. Refer to the statistics section Understanding your protection level when using TAC or AC for an indication of how well your application follows this practice.

Oracle recommends using a FAN-aware Oracle connection pool for hiding planned maintenance. There is no impact to users when your application uses an Oracle Pool with FAN and returns connections to the pool between requests. You do not need to make any application changes to use FAN.

When an Oracle connection pool receives the FAN event for planned downtime, it marks all connections at the instance to be drained. Immediately, checked-in connections are closed so that they are not re-used. As in-use connections are returned to the pool they are closed. This allows all connections to be closed gracefully over time.

If you are using a third-party, Java-based application server, then the most effective method to achieve draining and failover is to replace the pooled data source with UCP. Many application servers support this approach, including Oracle WebLogic Server, IBM WebSphere, IBM Liberty, Apache Tomcat, Red Hat WildFly (JBoss), Spring, Hibernate, and others. White papers from Oracle and other providers, such as IBM, describe how to use UCP with these application servers. Using UCP as the data source allows UCP features such as Fast Connection Failover, Runtime Load Balancing, Application Continuity and Transparent Application Continuity to be used with full certification.

FAN Coverage

FAN events are integrated with:

• Oracle Fusion Middleware and Oracle WebLogic Server

• Oracle Data Guard Broker

• Oracle JDBC Universal Connection Pool or Driver for both JDBC thin and Oracle Call Interface (OCI) interfaces

• ODP.NET Connection Pool for Unmanaged and Managed Providers

• Oracle Tuxedo

• SQL*Plus

• Oracle Database drivers for languages such as Python, Node.js and PHP

• Global Data Services

• Third party JDBC application servers using Oracle JDBC Universal Connection Pool

• Listeners

 

To enable FAN in the client:

Use the TNS alias or URL shown in the preceding discussion. This connection string is used to auto-configure the Oracle Notification Service (ONS) subscription at the client for FAN-event receipt when using an Oracle Database 12c or later client driver. For older drivers, refer to the FAN white paper in the Appendix for configuration details. ONS provides a secure communication path between the database tier and the client-tier allowing the client to be notified of service availability (components stopping or starting) as well as runtime load balancing advice for better work placement during normal operation.

Depending on the client, enable FAN in the application configuration properties as follows

Universal Connection Pool or JDBC thin driver (starting 12.2)

              Set the property FastConnectionFailoverEnabled

WebLogic Active GridLink for Oracle RAC

             FAN and Fast Connection Failover are enabled by default

Oracle WebLogic Server, IBM WebSphere, IBM Liberty, Apache Tomcat, Red Hat WildFly (JBoss), JDBC Applications

              Use Universal Connection Pool as a connection pool replacement

ODP.Net clients (Managed and Unmanaged Providers)

              Set “HA events = true;pooling=true” in the connect string if using ODP.Net 12.1 or earlier

Oracle Call Interface (OCI) clients and OCI-based drivers

             Oracle Call Interface (OCI) clients without native settings can use an oraacces.xml file and set events to true

            Python, Node.js and PHP have native options. In Python and Node.js you can set an events mode when creating a connection pool. In PHP, edit php.ini add the entry oci8.events=on

            SQL*Plus enables FAN by default

In the ATP-D environment ONS offers an optional TLS (wallet-based) authentication. Depending upon the type of application (JDBC or Oracle Call Interface), the wallet configuration must follow particular rules

 

Transparent Application Continuity Coverage

Transparent Application Continuity for Oracle Autonomous Database supports the following clients:

It is strongly recommended to use the latest client drivers. Oracle Database 19c client drivers and later provide full support for TAC.

• Oracle JDBC Replay Driver 18c or later. This is a JDBC driver feature provided with Oracle Database 18c for Application Continuity

• Oracle Universal Connection Pool (UCP) 18c or later with Oracle JDBC Replay Driver 18c or later.

• Oracle WebLogic Server Active GridLink, or third-party JDBC application servers using UCP with Oracle JDBC Replay Driver 18c or later

• Java connection pools or standalone Java applications using Oracle JDBC Replay Driver 18c or later

• Oracle Call Interface Session Pool 19c or later

• SQL*Plus 19c (19.3) or later

• ODP.NET pooled, Unmanaged Driver 18c or later (“Pooling=true” default in 12.2 and later)

• Oracle Call Interface based applications using 19c OCI driver or later

 

How to handle these requirements based on database version?


In 11gR2, we have the concept of role-based database services. For Data Guard environments running older release, this is achieved via a database startup trigger.

Note: Starting from 12cR1 (12.1), you can consider using Global Data Services to seamlessly manage client connections for dataguard or Golden Gate replicated database

  

Solution

Pre 11.2 Configuration:

+ On the current primary, create 2 services, one to connect to the primary (prim_db) and another (stby_db) to connect to the read only standby:

SQL> exec dbms_service.create_service('prim_db','prim_db');

SQL> exec dbms_service.create_service('stby_db','stby_db');


+ On the current primary, start the service that is needed to connect to primary:

SQL> exec dbms_service.start_service('prim_db');


+ Now, on the current primary, create the trigger to manage these services accordingly:

CREATE OR REPLACE TRIGGER startDgServices after startup on database

DECLARE

  db_role VARCHAR(30);

  db_open_mode VARCHAR(30);

BEGIN

  SELECT DATABASE_ROLE, OPEN_MODE INTO db_role, db_open_mode FROM V$DATABASE;

  IF db_role = 'PRIMARY' THEN DBMS_SERVICE.START_SERVICE('prim_db'); END IF;

  IF db_role = 'PHYSICAL STANDBY' AND db_open_mode LIKE 'READ ONLY%' THEN DBMS_SERVICE.START_SERVICE('stby_db'); END IF;

END;

/


+ Note down the current online redo log sequence on primary and switch the current logfile:

SQL> select thread#, sequence# from v$log where status = 'CURRENT';

SQL> alter system archive log current;


Ensure that the archive with sequence# which was shown as current is shipped and applied on standby. This ensures that the redo from CREATE TRIGGER command is applied on standby.
Now, shutdown and startup the standby database to make the trigger take effect:

SQL> shut immediate;

SQL> startup;


+ Configure client TNSNAMES.ORA entry. In below example PRIM_DB should be used by clients that need to connect to primary database.

STBY_DB should be used by clients that need to access the read only standby database:

 

PRIM_DB =

  (DESCRIPTION =

    (ADDRESS_LIST =

      (FAILOVER = ON)

      (LOAD_BALANCE = OFF)

      (ADDRESS = (PROTOCOL = TCP)(HOST = primary.oracle.com)(PORT = 1521))

      (ADDRESS = (PROTOCOL = TCP)(HOST = standby.oracle.com)(PORT = 1521))

    )

    (CONNECT_DATA =

      (SERVICE_NAME = prim_db)

    )

  )

 

STBY_DB =

  (DESCRIPTION =

    (ADDRESS_LIST =

      (FAILOVER = ON)

      (LOAD_BALANCE = OFF)

       (ADDRESS = (PROTOCOL = TCP)(HOST = standby.oracle.com)(PORT = 1521))

       (ADDRESS = (PROTOCOL = TCP)(HOST = primary.oracle.com)(PORT = 1521))

    )

    (CONNECT_DATA =

      (SERVICE_NAME = stby_db)

    )

  ) 

 

Configuration in 11.2 or later release:

We will be using role-based database services introduced in 11.2. To use role-based services, Oracle Clusterware must be installed and active on the primary and standby sites for both single instance (using Oracle Restart) and Oracle RAC databases.

+ On the primary and standby hosts create the service (prim_db) that the clients will use to connect to the primary database.

 The service should be created such that it is associated with and runs on the database when it is in the ‘PRIMARY’ database role:

 

On primary:
[oracle@vmOraLinux ~]$ srvctl add service -d ora12c -s prim_db -l PRIMARY -e SESSION -m BASIC -w 10 -z 10

On standby:
[oracle@vmOraLinux ~]$ srvctl add service -d stby12c -s prim_db -l PRIMARY -e SESSION -m BASIC -w 10 -z 10

 

+ Start the primary database service (prim_db) on the current primary:

 

[oracle@vmOraLinux admin]$ srvctl start service -d ora12c -s prim_db


+ Next, on the primary and standby hosts create the service (stby_db) that the clients will use to connect to the read only standby database.

The service should be created such that it is associated with and runs on the database when it is in the ‘PHYSICAL_STANDBY’ database role:

 

On primary:
[oracle@vmOraLinux ~]$ srvctl add service -d ora12c -s stby_db -l PHYSICAL_STANDBY -e SESSION -m BASIC -w 10 -z 10

On standby:
[oracle@vmOraLinux ~]$ srvctl add service -d stby12c -s stby_db -l PHYSICAL_STANDBY -e SESSION -m BASIC -w 10 -z 10


+ Now, start the standby service (stby_db) on standby host:

 

[oracle@vmOraLinux ~]$ srvctl start service -d stby12c -s stby_db

If you get below error when trying to start the service on standby:

 

PRCD-1084 : Failed to start service stby_db

PRCR-1079 : Failed to start resource ora.stby12c.stby_db.svc

CRS-5017: The resource action "ora.stby12c.stby_db.svc start" encountered the following error:

ORA-44317: database open read-only

ORA-06512: at "SYS.DBMS_SERVICE", line 478

ORA-06512: at "SYS.DBMS_SERVICE", line 229

ORA-06512: at line 1

 

Then as workaroud, follow below steps:

 

a. Start the service on primary instead:

[oracle@vmOraLinux ~]$ srvctl start service -d ora12c -s stby_db

 

b. Perform a few log switches on primary and allow standby to catch up.

 

c. Now, the service should start on standby:

[oracle@vmOraLinux ~]$ srvctl start service -d stby12c -s stby_db

 

d. Stop the service on primary:

[oracle@vmOraLinux ~]$ srvctl stop service -d ora12c -s stby_db

         
+ Configure client TNSNAMES.ORA entry. In below example PRIM_DB should be used by clients that need to connect to primary database.

STBY_DB should be used by clients that need to access the read only standby database:

 

PRIM_DB =
  (DESCRIPTION =
     (ADDRESS_LIST =
       (FAILOVER = ON)
       (LOAD_BALANCE = OFF)
       (ADDRESS = (PROTOCOL = TCP)(HOST = primary.oracle.com)(PORT = 1521))
       (ADDRESS = (PROTOCOL = TCP)(HOST = standby.oracle.com)(PORT = 1521))
     )
     (CONNECT_DATA =
       (SERVICE_NAME = prim_db)
     )
  )

STBY_DB =
  (DESCRIPTION =
     (ADDRESS_LIST =
       (FAILOVER = ON)
       (LOAD_BALANCE = OFF)
       (ADDRESS = (PROTOCOL = TCP)(HOST = standby.oracle.com)(PORT = 1521))
       (ADDRESS = (PROTOCOL = TCP)(HOST = primary.oracle.com)(PORT = 1521))
     )
    (CONNECT_DATA =
       (SERVICE_NAME = stby_db)
    )
  )

 

CONCLUSION

The Oracle Autonomous Database is configured and managed for high availability on your behalf. No additional configuration or management is required by you.

There are a few simple steps to achieving Continuous Availability for your applications:

• Select the ATP-D service that is appropriate for your SLA’s

• Configure Fast Application Notification (FAN)

• Use the recommended connection string for your applications

• Use application best practices to optimize for draining

• Use Transparent Application Continuity or Application Continuity for continuous service

 

By following these five simple steps, planned maintenance activities will no longer require outages and unplanned events will rarely result in failed transactions and interruptions to service.

 

More info:

https://www.oracle.com/docs/tech/database/continuous-service-for-apps-on-atpd.pdf

No comments:

Post a Comment

Best Practices for Configuring Far Sync in Oracle Data Guard

  Best Practices for Configuring Far Sync in Oracle Data Guard Alireza Kamrani     14/07/2025 Active Data Guard Far Sync...