How to create Oracle Database Services for High Availability
You can visit my technical group on LinkedIn
https://www.linkedin.com/groups/8151826
Introduction
To use High Availability features like Fast Application Notification (FAN), Draining, and Application Continuity, you need to create a custom database service and set the high availability attributes accordingly. This is not possible when using the default database service, which is the service with the same name as the PDB. Additionally, custom database services provide location transparency for the application.
FEATURE SET FOR KEEPING YOUR APPLICATIONS CONTINUOUSLY AVAILABLE
Applications achieve continuous availability when planned maintenance, unplanned outages, and load imbalances of the database tier are hidden.
Oracle provides a set of features that you can choose from to keep your application available during planned events, unplanned outages and load imbalances.
You can think of these features as an insurance policy protecting your applications from service interruptions.
The best features are those that are fully transparent to your application so your application developers can focus on building functionality, not infrastructure, and that continue to protect the application when it changes in the future. We call this future-proofing.
Start with the feature set:
Draining and Rebalancing Sessions for Planned Maintenance:
When planned maintenance starts, sessions that need to be drained from an instance, PDB, or database are marked to be drained. Idle sessions are released gradually. Active sessions are drained when the work executing in that session completes. Draining of sessions is in wide use with Oracle connection pools and mid-tiers configured for Fast
Application Notification (FAN):
Starting with Oracle Database 18c, the database itself drains sessions when PDBs and instances are stopped or relocated. Draining is always the best solution for hiding planned maintenance. Failover solutions such as Application Continuity are the fallback when work will not drain in the time allocated.
Transparent Application Failover (TAF):
TAF is a feature dating back to Oracle8i. Following an instance failure, TAF creates a new session and, when using SELECT mode, on demand, replays queries back to where they were before the failure occurred. Starting with Oracle Database 12.2, TAF offers FAILOVER_RESTORE, matching Application Continuity, to restore the initial session state before queries are replayed. Cursors are replayed using the state re-established initially. Applications using TAF must not change session state later in the session, (for example PLSQL, temp tables, temp lobs, sys context) as this session state is not restored.
Application Continuity (AC):
Application Continuity hides outages starting with Oracle database 12.1 for thin Java-based applications and Oracle Database 12.2.0.1 for OCI and ODP.NET based applications. Application Continuity rebuilds the session by recovering the session from a known point which includes session states and transactional states. Application Continuity rebuilds all in-flight work. The application continues as it was, seeing a slightly delayed execution time when a failover occurs. The standard mode for Application Continuity is for OLTP-style pooled applications. Transparent Application Continuity (TAC) Starting with Oracle Database18c, Transparent Application Continuity (TAC) transparently tracks and records session and transactional state so the database session can be recovered following recoverable outages. This is done with no reliance on application knowledge or application code changes, allowing Transparent Application Continuity to be enabled for your applications. Application transparency and failover are achieved by consuming the state-tracking information that captures and categorizes the session state usage as the application issues user calls.
Transparent Application Continuity with Draining is the recommended solution for Continuous Availability.
If you are using a 12c driver, or customization is required for initial states or side effects, then you should use Application Continuity.
TAF continues to be available and is fully supported.
INSTRUCTIONS FOR APPLICATION CONFIGURATION :
Follow these instructions when implementing your solution.
1. Use an Oracle Clusterware managed service that is not the default database service (the default service has the same name as the database or PDB). The services that you create provide location transparency and HA features.
2. Use the recommended Connection String (explained later in this paper) with built in timeouts, retries and delay so incoming connections do not see errors during outages.
3. Fast Application Notification (FAN) is a mandatory component to initiate draining, to break out of failures, and to rebalance sessions when services resume and when load imbalances occur. For outages such as node and network failures, fast failover of the application does not happen if the client is not interrupted by FAN.
FAN applies to all failover solutions. When configuring FAN, use Auto-configuration of ONS. Use the recommended TNS format strictly. Do not alter this format. (Exception: if your client is pre-12c, you will manually configure FAN.)
4.Before maintenance starts, drain your work from the instances or nodes targeted for maintenance. Enable FAN with Oracle Connection Pools or Connection tests (or both). Oracle connection pools with FAN are the best solution as pools provides full life cycle of session movement. That is, draining and rebalancing of work as maintenance progresses. When using FAN, return your connections to the pool. If you are using server draining (the alternate plan explained later in this paper) and your test is not a standard test, add your test to the server using DBMS_APP_CONT_ADMIN. Sessions that do not drain within the DRAIN_TIMEOUT will be failed over.
5. The standard solution for failing over sessions is Transparent Application Continuity (TAC). Use Application Continuity (AC) if you are using Oracle Database 12c Release 2, or you want to customize with side effects or callbacks, or have an application that uses state such as temporary tables and never cleans up. Use Transparent Application Failover (TAF) if your application is read only and does not change Oracle session state in the session after the initial setup.
BUILDING BLOCKS FOR CONTINUOUS AVAILABILITY OF YOUR APPLICATION :
Use Services
Service is a logical abstraction for managing work. Services allow applications to benefit from the reliability of the redundant parts of the MAA system. The services hide the complexity of the underlying system from the client by providing a single system image for managing work.
The service is :
• A unit for management:
a handful of services are manageable, many nodes, instances, listeners, and network interfaces are not manageable. The service provides location transparency for sites and databases.
• A unit for availability:
resources are recovered quickly, independently and in parallel for each service and without the need to start entire software stacks; and
• A unit for performance: work is routed transparently across the MAA system according to service quality and priority. Services are measured against service level thresholds and violations are reported to management with advised solutions in AWR.
Configure the Connection String or URL for High Availability:
All Oracle-supplied connect strings will conform to the following recommendations. There is no need to do anything if you use the Oracle-supplied wallet.
The following TNS/URL configuration is recommended for use for connecting at failover, switchover, fallback and basic startup. Set RETRY_COUNT, RETRY_DELAY, CONNECT_TIMEOUT and TRANSPORT_CONNECT_TIMEOUT parameters in the tnsnames.ora file or the URL to allow connection requests to wait for the service and connect successfully. Connection attempts and retries are managed by Oracle Database Net Services.
Set CONNECT_TIMEOUT to a high value to prevent login storms such as 90s or 120s.
Low values can result in frenzied login-attempts due to the application or pool cancelling and retrying connection attempts.
Do not set (RETRY_COUNT+1)*RETRY_DELAY or CONNECT_TIMEOUT larger than your response time SLA.
The application should either connect or receive an error within the response time SLA.
These are general recommendations for configuring the connections for high availability. Do not use Easy Connect Naming on the client as EZCONNECT has no high availability capabilities. Note that the standby-scan specified below refers to the SCAN address available on the STANDBY site specified in your (Active) Data Guard configuration. Attempt will be made to connect to the PRIMARY site first, and if the service is not available, attempt to connect to this service at the standby. Once the location of the service is known, Oracle drivers 12.2 and later remember the address_list with that service offered and chooses this first, until the service next moves.
Adding the standby-scan to TNS connection descriptor to transparently fail over to the standby-scan is optional.
Failing over to a standby database within the same region will have acceptable performance in most cases versus failing over to a standby database in a different region where additional network latency may result in unacceptable response time performance.
In the latter case, a site failover operation will be required which involves DNS failover to another region containing mid-tier resources and standby database. Use this Connection String for ALL Oracle driver version 12.2 or higher:
Alias (or URL) = (DESCRIPTION = (CONNECT_TIMEOUT= 90)(RETRY_COUNT=50)(RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3) (ADDRESS_LIST = (LOAD_BALANCE=on) (ADDRESS = (PROTOCOL = TCP)(HOST=primary-scan)(PORT=1521))) (ADDRESS_LIST = (LOAD_BALANCE=on) (ADDRESS = (PROTOCOL = TCP)(HOST=standby-scan)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME = YOUR SERVICE)))
FAST APPLICATION NOTIFICATION (FAN) :
FAN must be used. FAN is a required component for interrupting the application to failover. Without FAN, applications can hang on TCP/IP timeout following hardware and network failures, and omit to rebalance when resources resume. All Oracle pools and all Oracle application servers use FAN. Third-party JAVA application servers can use UCP to enable FAN. No application changes are required to use FAN. These are configuration changes only.
For continuous service during planned maintenance, use FAN with:
• Oracle pools or
• UCP with third-party JDBC application servers or
• The latest Oracle client drivers For continuous service during unplanned outages, use FAN with • Application Continuity or
• Transparent Application Continuity The format of the ADDRESS_LIST described above is important for a number of reasons, one being that this format allows for autoconfiguration of ONS. ONS is used to propagate FAN events to mid-tier pools and clients.
When a database connection is made, database-tier ONS information is sent back to the mid-tier, allowing the mid-tier to establish ONS communication paths automatically.
This means that configuration can be dynamic and need not be maintained at the mid-tier. ONS connections will be made from both the primary and standby sites.
To setup FAN ON clients:
Use the TNS alias or URL shown in the this discussion.
This connection string is used to auto-configure the Oracle Notification Service (ONS) subscription at the client for FAN-event receipt when using an Oracle Database 12c or later client driver.
For older drivers, refer to the FAN white paper in the Appendix for configuration details. ONS provides a secure communication path between the database tier and the client-tier allowing the client to be notified of service availability (components stopping or starting) as well as runtime load balancing advice for better work placement during normal operation.
To enable FAN in the client:
1. Use the TNS alias or URL provided. This connect string will auto-configure ONS (auto-ONS) subscription at the client for FAN event receipt when using a 12c driver or later.
For older drivers, refer to the FAN white paper.
2. Depending on the client, enable FAN in the application configuration properties as follows
2a-Universal Connection Pool or JDBC thin driver (starting 12.2) :
Set the property FastConnectionFailoverEnabled
2b-WebLogic Active GridLink for Oracle RAC :
FAN and Fast Connection Failover are enabled by default
2c-Oracle WebLogic Server, IBM WebSphere, IBM Liberty, Apache Tomcat, Red Hat WildFly (JBoss), JDBC Applications :
Use Universal Connection Pool as a connection pool replacement
3c- ODP.Net clients (Managed and Unmanaged Providers)
Set “HA events = true;pooling=true”
in the connect string if using ODP.Net 12.1 or earlier
4c-Oracle Call Interface (OCI) clients and OCI-based drivers :
Oracle Call Interface (OCI) clients without native settings can use oraacces.xml and set events to true Python, Node.js and PHP have native options.
In Python and Node.js you can set an events mode when creating a connection pool.
In PHP, edit php.ini add the entry oci8.events=on SQL*Plus enables FAN by default
3. You can configure ONS to use TLS (wallet-based) authentication.
For JDBC applications:
a) Ensure the following JAR files are present in your application’s CLASSPATH : (ons.jar,osdt_cert.jar,osdt_core.jar,oraclepki.jar). The Oracle ATP-D service always uses TLS.
b) Specify the wallet for FAN do one of the following:
• To use AUTO-ONS with wallets an application must set the following Java system properties: “-Doracle.ons.walletfile=/replace_this_with_host_path/onswallet”
and
“Doracle.ons.walletpassword=myONSwalletPassword”
This cannot be set on a per-pool or per-connection basis
• To use explicit ONS configuration (instead of AUTO-ONS) do one of the following:
i) Programmatically within UCP, call setONSConfiguration(), for example
ii) Set declaratively using an UCP XML
configuration file:
ii) For Oracle Call Interface (OCI) applications Oracle client drivers 12.2 or later are required:
The <wallet_location> path should be the name of the directory containing the wallet.
Other parameters may be set in the ons section of oraaccess.xml, including <hosts>, <max_connections>, and <subscription_wait_timeout>.
Drivers that support native event setting controls may omit the <events> section and use the driver setting instead.
By default connections will be established to the database even if ONS fails.
If you prefer connections to fail in this scenario, you can add a section to the same level as <events> and <ons>:
<fan>
<subscription_failure_action>
error
</subscription_failure_action>
</fan>
Place the oraaccess.xml file in the same directory as the tnsnames.ora and sqlnet.ora network files. For example, when using Oracle Instant Client these files might be in the default directory network/admin. Alternatively, all network configuration files can be put in another accessible directory. Then set the environment variable TNS_ADMIN to that directory name.
FAN and Draining
Service status changes like stopping and starting the service can be propagated to Oracle client drivers and Oracle connection pools using FAN, a high-availability notification mechanism. FAN events are published using Oracle Notification Service (ONS).
The clients take immediate action accordingly, e.g., to balance connections or to establish new connections and route work to available database instances or where the service is started.
Draining means allowing connected sessions to complete their current work in a specific duration (drain timeout) before maintenance is started.
Set the database service attribute notification to true to enable FAN and drain_timeout to time_in_seconds to enable draining:
srvctl add service -db <db_unique_nam> -pdb <pdb_name> -service <service_name>
-preferred <instance_name>
-available <instance_name>
-stopoption IMMEDIATE
-notification true
-drain_timeout 300
#example
srvctl add service -db RACCDB_POP -pdb PDB1 -service drainsrv -preferred RACCDB1,RACCDB2 -stopoption IMMEDIATE -notification true -drain_timeout 300
#start the service
srvctl start service -db RACCDB_POP -service drainsrv
#check status
srvctl status service -db RACCDB_POP -service drainsrv
Service drainsrv is running on instance(s) RACCDB1,RACCDB2
When you stop the service on instance 1, RACCDB1:
#stop service on instance 1
srvctl stop service -db RACCDB_POP -service drainsrv -instance RACCDB1
#check status
srvctl status service -db RACCDB_fra -service drainsrv
Service drainsrv is running on instance(s) RACCDB2
You can specify a different drain_timeout when stopping the service:
srvctl stop service -db RACCDB_POP -service drainsrv -instance RACCDB1 -drain_timeout 10
New connections will immediately be directed to instance 2.
Existing sessions to instance 1 remain connected to instance 1 for the duration of drain_timeout and can finish their work (commit) during this time without any interruption.
Ideally, all sessions drain (=finish their work) within the drain_timeout specified. You can set the drain_timeout as high as needed for your application, e.g., 30 minutes or 1 hour. After that, the database node will be ready for maintenance, and the application will not encounter any interruption if all sessions drain.
Transaction Guard:
Transaction Guard (TG) is a database feature for applications to ensure that every transaction has been executed at most once in case of planned or unplanned outage. In the background each transaction is associated with a logical transaction IDthat let you determine, afterward, if the transaction has committed and has been completed. Without Transaction Guard (TG) the typical example that is given is a transaction that is increasing by 5% the salary of all company employees. If the application fail, outside of the fact that you will make happy employees by giving them Double% you cannot rerun blindly the transaction from scratch. With Transaction Guard you can know from application layout how it went and present a warning or gray the submit button to avoid a second dramatic rerun.
By default the Java connection has autocommit property set to TRUE
From Oracle Technology Network the requirements for TG are:
-Use Oracle Database Release 12.1 or later.
-Use an application service for all database work.
Create the service using srvctl if using RAC or DBMS_SERVICE if not using RAC. You may also use GDSCTL.
-Set the following properties on the service – COMMIT_OUTCOME = TRUE for Transaction Guard.
-Grant execute permission on DBMS_APP_CONT package to the application user.
-Increase DDL_LOCK_TIMEOUT if using Transaction Guard with DDL statements (for example, 10 seconds).
Problem That Transaction Guard Solves
In applications without Transaction Guard, a fundamental problem for recovering applications after an outage is that the commit message that is sent back to the client is not durable.
If there is a break between the client and the server, the client sees an error message indicating that the communication failed. This error does not inform the application if the submission executed any commit operations, if a procedural call completed and executed all expected commits and session state changes, or if a call failed part way through or, yet worse, is still running disconnected from the client.
Without Transaction Guard, it is impossible or extremely difficult to determine the outcome of the last commit operation, in a guaranteed and scalable manner, after a communication failure to the server. If an application must determine whether the submission to the database was committed, the application must add custom exception code to query the outcome for every possible commit point in the application. Given that a system can fail anywhere, this is almost impractical because the query must be specific to each submission. After an application is built and is in production, this is completely impractical. Moreover, a query cannot give the correct answer because the transaction could commit immediately after that query executed. Indeed, after a communication failure the server may still be running the submission not yet aware that the client has disconnected. For PL/SQL or Java in the database, for a procedural submission, there is also no record as to whether that submission ran to completion or was aborted part way through. While such a procedure may have committed, subsequent work may not have been done for the procedure.
Failing to recognize that the last submission has committed, or will commit sometime soon or has not run to completion, can lead applications that attempt to replay, thus causing duplicate transaction submissions and other forms of "logical corruption" because the software might try to reissue already persisted changes.
Without Transaction Guard, if a transaction has been started and commit has been issued, the commit message that is sent back to the client is not durable. The client is left not knowing whether the transaction committed. The transaction cannot be validly resubmitted if the nontransactional state is incorrect or if it already committed. In the absence of guaranteed commit and completion information, resubmission can lead to transactions applied more than once and in a session with the incorrect state.
Solution That Transaction Guard Provides
Effective with Oracle Database 12c Release 1 (12.1.0.1), Transaction Guard provides new, integrated tools for applications to use to achieve idempotence automatically and transparently, and in a manner that scales. Its key features are the following:
- Durability of COMMIT outcome by saving a logical transaction identifier (LTXID) at commit for all supported transaction types against the database (Oracle Database 12cRelease 1 (12.1.0.1) or later). This includes idempotence for transactions executed using autocommit, from inside PL/SQL, from remote transactions, One-Phase XA transactions, and from callouts that cannot otherwise be identified using generic means.
- Use of the LTXID to support at-most-once execution semantics, such that database transactions protected by logical transaction identifiers cannot be duplicated when there are multiple copies of that transaction in flight identified by the LTXID.
- Blocking of a commit of in-flight work to ensure that regardless of the outage situation, another submission of the same transaction protected by that LTXID cannot commit.
- Identification of whether work committed at an LTXID was committed as part of a top-level call (client to server), or was embedded in a procedure (such as PL/SQL) at the server. An embedded commit state indicates that while a commit completed, the entire procedure in which the commit executed has not yet run to completion. Any work beyond the commit cannot be guaranteed to have completed until that procedure itself returns to the database engine.
- Identification of whether the database to which the commit resolution is directed is ahead of, in sync with, or behind the original submission, and rejection when there are gaps in the submission sequence of transactions from a client. It is considered an error to attempt to obtain an outcome if the server or client are not in sync on an LTXID sequence.
- A callback on the JDBC Thin client driver that fires when the LTXID changes. This can be used by higher layer applications such as WebLogic Server and third parties to maintain the current LTXID ready to use if needed.
- Namespace uniqueness across globally disparate databases and across databases that are consolidated into a Multitenant infrastructure. This includes Oracle Real Application Clusters (Oracle RAC) and RAC One, Data Guard, and Multitenant databases.
- Service name uniqueness across global databases and across databases that are consolidated into a Multitenant infrastructure. This ensures that connections are properly directed to the transaction information.
At-Most-Once Execution
Transaction Guard uses the logical transaction identifier (LTXID) to avoid duplicate transactions. This ability to ensure at most one execution of a transaction is referred to as transaction idempotence. The LTXID is persisted on commit and is reused following a rollback. During normal runtime, an LTXID is automatically held in the session at both the client and server for each database transaction. At commit, the LTXID is persisted as part of committing the transaction.
The at-most-once protocol requires that the database maintain the LTXID for the retention period agreed for replay. The default retention period is 24 hours, although you might need a shorter or longer period, conceivably even a week or longer. The longer the retention period, the longer the at-most-once check blocks an old transaction using an old LTXID from replay. The setting is available on each service. When multiple databases are involved, as is the case when using Data Guard and Active Data Guard, the LTXID is replicated to each database involved through the use of redo.
The getLTXID API, provided for Oracle JDBC Thin (with similar APIs for OCI, OCCI, and ODP.NET clients), lets an application retrieve the logical transaction identifier that was in use on the dead session. This is needed to determine the status of this last transaction.
The DBMS_APP_CONT.GET_LTXID_OUTCOME PL/SQL subprogram lets an application find the outcome of an action for a specified logical transaction identifier. Calling DBMS_APP_CONT.GET_LTXID_OUTCOME may involve the server blocking the LTXID from committing so that the outcome is known. This is a requirement if a transaction using that LTXID is in flight or is about to commit. An application using Transaction Guard obtains the LTXID following a recoverable error, and then calls DBMS_APP_CONT.GET_LTXID_OUTCOMEbefore attempting a replay.
The Oracle PL/SQL supplied package to know if a transaction has committed and has completed is DBMS_APP_CONT.GET_LTXID_OUTCOME:
SQL> DESC DBMS_APP_CONT
PROCEDURE GET_LTXID_OUTCOME
Argument TYPE IN/OUT
--------- --------- ------
CLIENT_LTXID RAW IN
COMMITTED BOOLEAN OUT
USER_CALL_COMPLETED BOOLEAN OUT
This procedure has two Boolean out parameters and until recently it was possible to map a PL/SQL Boolean in Java from Oracle 12cR2 above, are wrapping this procedure in a PL/SQL block to return numbers instead of Boolean.
Application Continuity:
What happens to sessions with in-flight work that do not complete their work within drain_timeout?
This is where Application Continuity comes into play!
Application Continuity will restore the session state and replay the in-flight work on a servicing instance transparently for the application.
The database service needs to have the Application Continuity attributes set:
srvctl add service -db <db_unique_nam> -pdb <pdb_name> -service <service_name>
-preferred <instance_name>
-available <instance_name>
-stopoption IMMEDIATE
-notification true
-drain_timeout 300
-commit_outcome true
-retention 86400
-replay_init_time 600
-session_state DYNAMIC
-failover_restore LEVEL1
-failovertype TRANSACTION
For Transparent Application Continuity (an operation mode of Application Continuity) set failover_restore and failovertype to AUTO:
srvctl add service -db <db_unique_nam> -pdb <pdb_name> -service <service_name>
...
-session_state defaults to AUTO if failover_type is set to AUTO
-failover_restore AUTO
-failovertype AUTO
A key difference is that Application Continuity (AC) requires the application to use a Connection Pool, while Transparent Application Continuity(TAC) works with or without Connection Pools.
Explanation of the following attributes:
- commit_outcome set to true enables Transaction Guard, which is required by Application Continuity and Transparent Application Continuity.
- retention determines the amount of time (in seconds) that the commit outcome is retained in the database.
- replay_init_time specifies the difference between the time, in seconds, of the original execution of the first operation of a request and the time that the replay is ready to start after a successful reconnect. Application Continuity will not replay after the specified amount of time has passed. This parameter is intended to avoid the unintentional execution of a transaction when a system is recovered after a long period.
- session_state describes how the non-transactional session state is changed by the application within a request. Examples of the session states are NLS settings, optimizer preferences, event settings, PL/SQL global variables, and temporary tables.
Services for Active Data Guard
Create role-based services when using Active Data Guard. Set the role attribute to primary for services to be used for the primary site (read/write) and standby for services to be open on the standby site (read-only). The services start and stop automatically at a site based on their role.
srvctl add service -db <db_unique_nam> -pdb <pdb_name> -service <service_name>
...
-failover_restore AUTO
-failovertype AUTO
-role primary
Create the same service name with the same role on both primary and standby sites, e.g.:
#on primary and standby sites
srvctl add service -db RACCDB_POP -pdb pdb1 -service tacsrv -preferred RACCDB1,RACCDB2 -stopoption IMMEDIATE -notification true -drain_timeout 300 -commit_outcome true -retention 86400 -replay_init_time 600 -session_state AUTO -failover_restore AUTO -failovertype AUTO
-role primary
#on primary site
srvctl start service -db RACCDB_POP -service tacsrv
Note that we didn’t start the service on the standby site.
The service will automatically start after a switchover or failover operation.
Note 1: When using Active Data Guard with single-instance databases, remove the preferred and available attributes that are related to Oracle RAC databases.
Note 2: To create the same service name on both sites in Oracle Cloud, include the domain name in the service, e.g., tacsrv.company.com,
Using the default PDB or DB name is not recommended, as high availability attributes cannot be set for these services. Always create a custom service for your application and set high availability attributes accordingly.
Draining in Data Guard Environments
To allow services to drain during switchover operations, use the WAIT option of the switchover Data Gaurd broker command:
DGMGRL> switchover to <standby_db_unique_name> wait;
--example
DGMGRL> switchover to CDB01_POP wait;
Stopping services and waiting for sessions to drain...
The broker determines the maximum drain_timeout value for all active services and waits for up to that amount of time for all current client requests to be processed, before proceeding with the switchover.
Alternatively, you can specify the number of seconds to wait for sessions to drain.
DGMGRL> switchover to <standby_db_unique_name> wait <timeout_in_seconds>;
Services in Autonomous Database
Autonomous Database provides predefined services with different characteristics to be used for applications depending on their requirements.
(Transparent) Application Continuity can be enabled as follows:
Connection String
The application should use the recommended connection string configured for high availability:
Alias = (DESCRIPTION = (CONNECT_TIMEOUT=90)(RETRY_COUNT=50)(RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3)
(ADDRESS_LIST = (LOAD_BALANCE=ON)
(ADDRESS = (PROTOCOL=TCP)(HOST=primary-scan)(PORT=1521)))
(ADDRESS_LIST = (LOAD_BALANCE=ON)
(ADDRESS = (PROTOCOL=TCP)(HOST=secondary-scan)(PORT=1521)))
(CONNECT_DATA = (SERVICE_NAME=gold-cloud)))
How To Configure Server Side Transparent Application Failover
This document provides basic instructions for configuring Server side (versus client) TAF or Transparant Application Failover.
Server side TAF settings override client-side counterparts that might be configured in TNS connect descriptors. If TAF is not configured on the client side, then at a minimum, the failover type must be set to enable TAF. If the failover type is set on the server side, then the failover method will default to BASIC. Delay and retries are optional and may be specified independently.
SOLUTION
1. Create a service on the RAC cluster to setup for TAF. Example creates a service called server_taf to the database called rac, with instance names rac1 & rac2.
Please note for the service name, select a name that is unique and not the same as the default service name created. A special Oracle database service is created by default for your Oracle RAC database. This default service is always available on all instances in an Oracle RAC environment, unless an instance is in restricted mode. You cannot alter this service or its properties.
$srvctl add service -d rac -s server_taf -r "rac1,rac2" -P BASIC
2. Start the service server_taf
$srvctl start service -d rac -s server_taf
3. Check service is running
$srvctl config service -d rac
ractest
PREF: rac1 rac2
AVAIL: server_taf
PREF: rac1 rac2
AVAIL:
4. Find the service_id value for the service just created
sqlplus /nolog
Connect / as sysdba
SQL> select name,service_id from dba_services where name = 'server_taf';
NAME SERVICE_ID
-------------- ----------
server_taf 6
5. Review the standard setup for the services
SQL>col name format a15
col failover_method format a11
heading 'METHOD'
col failover_type format a10
heading 'TYPE'
col failover_retries format 9999999
heading 'RETRIES'
col goal format a10
col clb_goal format a8
col AQ_HA_NOTIFICATIONS format a5
heading 'AQNOT'
SQL>select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services where service_id = 6
NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
--------- ----------- ---------- -------- ---------- -------- -----
server_taf LONG NO
Please note there is no values for method, type or retries.
These are required to do server side TAF.
The cause of this problem has been identified and verified in an unpublished Bug 6886239DBMS_SERVICE parameters are not added using srvctl add service. This is fixed in release 11.2 onwards.
6. Add the server side failover parameters to the service. (đź“ŤPre 11.2)
Server side TAF method is BASIC.
BASIC is the only value currently supported.
This means that a new connection is established at failure time. It is not possible to pre-establish a backup connection.
(which is to say, PRECONNECT is not supported)
SQL> execute dbms_service.modify_service (service_name => 'server_taf' -
, aq_ha_notifications => true -
, failover_method => dbms_service.failover_method_basic -
, failover_type => dbms_service.failover_type_select -
, failover_retries => 180 -
, failover_delay => 5 -
, clb_goal => dbms_service.clb_goal_long);
PL/SQL procedure successfully completed.
đź“ŤFor 11.2 version and newer use SVRCTL to modify the service
srvctl modify service -d RAC -s server_taf -m BASIC -e SELECT -q TRUE -j LONG
Service can be checked with the command:
$srvctl config service -d RAC
Service name: server_taf
Service is enabled
Server pool: RAC_server_taf
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: true
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: RAC1,RAC2
Available instances:
7. Check the service and we can now see values for Method, Type and Retries.
SQL>select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services where service_id = 6
NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
--------- ----------- ---------- -------- ---------- -------- -----
server_taf BASIC SELECT 180 NONE LONG YES
8. Check the listener has the service registered. (output will look similar too following, depending on version used)
$lsnrctl services
Service "server_taf.za.oracle.com" has 2 instance(s).
Instance "rac1", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521))
"DEDICATED" established:0 refused:0 state:ready
LOCAL SERVER
Instance "rac2", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=node2)(PORT=1521))
9. Create a net service name. Here we have client load balancing between the two nodes.
SERVERTAF =
(DESCRIPTION =
(LOAD_BALANCE = yes)
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = server_taf.oracle.com)
)
)
10. Testing...
SQL> select host_name,instance_name from v$instance;
HOST_NAME INSTANCE_NAME
---------------- --------------------
node2 rac2
11. Shutdown the database in the node the connection has routed to
SQL> select instance_name from V$instance;
INSTANCE_NAME
----------------
rac2
SQL> shutdown abort;
ORACLE instance shut down.
12. TAF will now kick in
SQL> /
HOST_NAME INSTANCE_NAME
---------------- ----------------
node1 rac1
How to Setup SCAN Listener and Client for TAF and Load Balancing
Setup SCAN listener and Clients to use SCAN listener to connect to RAC using SCAN.
SOLUTION
For the purposes of this document, the following fictitious environment is used as an example to describe the procedure:
Service name : MYRAC, MYRAC_TAF and TEST
Verify SCAN Listener Configuration on Server
After grid installation is completed you can verify the SCAN Listener configuration on your server -
•At the $GRID_HOME/network/admin directory you will have two listener related files.
-rw-r--r-- 1 grid oinstall 887 Jul 13 09:33 listener.ora
-rw-r--r-- 1 grid oinstall 375 Jul 13 09:33 endpoints_listener.ora
example of listener.ora
LISTENER_SCAN3=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN3))))
# line added by Agent
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2))))
# line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1))))
# line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON
# line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON
# line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON
# line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN3=ON
# line added by Agent
example of endpoints_listener.ora
LISTENER_DBGRAC1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxxx)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=xx.xx.xx.xx.x.xx)(PORT=1521)(IP=FIRST))))
# line added by Agent
Check Status of SCAN IPs and SCAN Listener
srvctl command can be used to check the status of SCAN IPs and SCAN listener -
$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node myrac2
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node myrac1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node myrac1
Note that two SCAN IPs are online on node myrac1 and one is online on myrac2.
Respective SCAN listeners will be online on the nodes where the corresponding SCAN IP is online.
$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node myrac2
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node myrac1
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node myrac1
You can verify the instance registration with SCAN listeners using lsnrctl command -
$ lsnrctl status listener_scan2
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 28-AUG-2010 12:53:56
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))
STATUS of the LISTENER
------------------------
Alias LISTENER_SCAN2
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 23-AUG-2010 10:15:16
Uptime 5 days 2 hr. 38 min. 40 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/11.2.0/grid/log/diag/tnslsnr/myrac1/listener_scan2/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN2)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.xxx.xx)(PORT=1521)))
Services Summary...
Service "myrac" has 2 instance(s).
Instance "myrac1", status READY, has 2 handler(s) for this service...
Instance "myrac2", status READY, has 2 handler(s) for this service...
Service "myracXDB" has 2 instance(s).
Instance "myrac1", status READY, has 1 handler(s) for this service...
Instance "myrac2", status READY, has 1 handler(s) for this service...
The command completed successfully
Note that above output shows that the listener "listener_scan2" is running on myrac1 and both the instances are registered with this listener.
Each instance within the cluster registers itself with all the SCAN listener using remote_listener setting.
For e.g
NAME TYPE VALUE
-------------- ----------- ----------
local_listener string (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP) HOST=xxxxx)(PORT=1521))))
remote_listener string xxxxx:1521
The LOCAL_LISTENER will always point to the local VIP for the node while the REMOTE_LISTENER will point to <SCAN_NAME>:<PORT>
You can use srvctl command to change or edit the SCAN listener settings.
Refer Note 972500.1 : How to Modify SCAN Setting after Installation.
Configure Client to Connect Using SCAN
To configure client to connect using SCAN, configure the tnsnames.ora. For e.g -
MYRAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = xxxxx)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = myrac)
)
)
Configure Clients for Transparent Application Failover (TAF) Using SCAN
The above tns entry will connect to cluster using SCAN name "myrac-cluster-scan" to service "myrac" and assumes that the service is configured for server side TAF.
The default Service created at the time of installation and creation of RAC database is not TAF enabled.
You can create a TAF enabled service using srvctl command.
Refer Note 460982.1: How To Configure Server Side Transparent Application Failover
If you wish to use client side TAF setting for clients connecting using SCAN name, you can configure tnsnames.ora entry at the client side as -
MYRAC_TAF =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = xxxxx)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = myrac)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic))))
You can verify the client connections to database for TAF using :
SQL> SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)
FROM V$SESSION
GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;
MACHINE FAILOVER_TYPE
FAILOVER_METHOD FAILED_OVER COUNT(*)
----- ------- ------- ----- ---
XXXX SELECT BASIC NO 1
Older supported client versions which are using VIP Address TAF enabled connect string can continue to use VIP to connect to 11gR2 RAC setup.
Client Side Load Balancing
For clients connecting to RAC database via SCAN, the client side load balancing happens during the connection phase when the SCAN name gets resolved. Client side sqlnet trace can be used to verify this. For e.g
(3086231232) [28-AUG-2013 14:13:06:118] nlad_expand_hst: Expanding xxxxx
(3086231232) [28-AUG-2013 14:13:06:118] snlinGetAddrInfo: entry
(3086231232) [28-AUG-2013 14:13:11:119] snlinGetAddrInfo: exit
(3086231232) [28-AUG-2013 14:13:11:119] snlinGetNameInfo: entry
(3086231232) [28-AUG-2013 14:13:11:119] snlinGetNameInfo: exit
(3086231232) [28-AUG-2013 14:13:11:119] nlad_expand_hst: Adding IP xxx.xxx.xxx.xx
(3086231232) [28-AUG-2013 14:13:11:119] snlinGetNameInfo: entry
(3086231232) [28-AUG-2013 14:13:11:119] snlinGetNameInfo: exit
(3086231232) [28-AUG-2013 14:13:11:119] nlad_expand_hst: Adding IP xxx.xxx.xxx.xx
(3086231232) [28-AUG-2013 14:13:11:119] snlinGetNameInfo: entry
(3086231232) [28-AUG-2013 14:13:11:119] snlinGetNameInfo: exit
(3086231232) [28-AUG-2013 14:13:11:119] nlad_expand_hst: Adding IP xxx.xxx.xxx.xx
(3086231232) [28-AUG-2013 14:13:11:119] snlinFreeAddrInfo: entry
Note that the name "xxxxx" was resolved to three different IPs, client will randomly pick any of the IP for the connection to achieve client side load balancing.
A pre-11g Release 2 client (Oracle Database 11g Release 1 or Oracle Database 10g Release 2, or older) will not fully benefit from the advantages of SCAN.
For 12c and newer you can use sacn name and there us not challenging.
The older version Oracle Client will not be able to handle a set of three IPs returned by the DNS for SCAN.
Hence, it will try to connect to only the first address returned in the list and will more or less ignore the others. If the SCAN Listener listening on this specific IP is not available or the IP itself is not available, the connection will fail.
To workaround this problem :
•If you are using static SCAN IPs, include the address of all SCAN IPs in the tnsnames.ora entry.
TEST =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = <scan-VIP1>)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = <scan-VIP2>)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = <scan-VIP3>)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = myrac)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic))))
12c above:
TEST =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = <scan-ip>)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = myrac)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic))))
•If you are using GNS (dynamic SCAN IPs), use VIP listener for the connections.
However in this case the remote_listener parameter on each node should be set accordingly so that all instance cross registers with each VIP listener within the cluster.
Conclusion:
Always create and use custom database services for your application, as these allow you to set high availability attributes for Fast Application Notification (FAN), Draining, and Application Continuity.
In Data Guard environments, use role-based database services and create a service with the same name on all sites. Specify the WAIT option in the Data Guard broker switchover command to allow services to drain during switchover operations. Oracle Application Continuity hides database interruptions from end-users by replaying in-flight transactions on available database instances, completely transparent to end-users and without any application code changes needed. the Java code provided is an example to get you started using Oracle Application Continuity with Java. For production implementations, please follow the instructions and recommendations provided by Oracle.
Sincerely,
Alireza Kamrani
No comments:
Post a Comment