Fast Connection Node Affinity Awareness in Oracle RAC & internal mechanism related to RMAN
In
this topic I represented all of needed theory and technical notes to take RMAN
backup in a RAC env such as:
·
Node Affinity
Awareness concept in Oracle
·
RMAN Backup channel
optimization
·
Parallelism and
Multiplexing RMAN Backups
·
Load balancing RMAN
backup into RAC instances.
to enhance backups also I introduce a new
clause in 26ai at the service definition for backup
scripts.
Using RMAN to Create Backups in Oracle RAC
RMAN enables you to back up,
restore, and recover data files, control files, SPFILEs, and archived redo
logs. RMAN is included with the Oracle AI Database server and it is installed
by default. You can run RMAN from the command line or you can use it from the
Backup Manager in Oracle Enterprise Manager. In addition, RMAN is the
recommended backup and recovery tool if you are using Oracle Automatic Storage
Management (Oracle ASM). The procedures for using RMAN in Oracle RAC
environments do not differ substantially from those for Oracle non-cluster
environments.
What is Fast Connection Node Affinity Awareness in
Oracle RAC?
In an Oracle RAC + Universal
Connection Pool (UCP) + FAN/ONS environment, the following features interplay:
- Oracle Universal Connection Pool (UCP): A JDBC connection-pooling mechanism
provided by Oracle for Java applications.
- Oracle Notification Service (ONS) and
Fast Application Notification (FAN):
These provide notification of cluster/node/service events (node up/down,
service relocation) for RAC environments, so clients can react quickly.
- Connection Affinity: This is where a connection pool tries
to maintain a “preferred” or “affined” session/connection to the same RAC
instance (node) when that is optimal (for example, to keep session state
local, avoid cross-node pinging, etc). In UCP, there are modes like
transaction-based affinity or web-session affinity.
- Fast Connection Failover (FCF): A UCP feature that, when enabled, uses
FAN/ONS to detect node/service failures quickly and remove/stabilize
connections in the pool.
Putting this together, “fast
connection node affinity awareness” likely means that the connection pool (UCP)
is aware of the node-instance topology in a RAC cluster (via FAN/ONS), and it
both:
- quickly reacts to node/failure events
(fast failover)
- maintains affinity (i.e., preferring the
same node when appropriate) for performance optimization.
In other words: the pool is aware
of nodes, picks/keeps connections on the appropriate node, and fails over
quickly if that node goes down or a service move (relocate).
This corresponds with documented
features: UCP’s Connection Affinity (node/instance-aware) and Fast Connection
Failover.
0. Node Affinity
Awareness of Fast Connections ***
In some cluster database
configurations, some nodes of the cluster have faster access to certain data
files than to other data files.
RMAN automatically detects
this situation, which is known as node affinity awareness.
When deciding which channel to use
to back up a particular data file, RMAN gives preference to the nodes with
faster access to the data files that you want to back up.
For example, if you have a
three-node cluster, and if node1 has faster read/write access to data
files 2, 3, and 4 than the other nodes, then node1 has greater
node affinity to those files than node2 and node3.
Historical Context and Versioning
This feature was first documented
in the Oracle8i Parallel Server (the predecessor to Oracle RAC)
documentation released in early 1999. It
was primarily a feature of Recovery Manager (RMAN) designed to optimize
backups in cluster environments.
|
Feature
Name |
Oracle
Version |
Primary
Use Case |
|
Node
Affinity Awareness |
Oracle 8i (8.1.5) |
RMAN optimization for clusters
(detecting faster data file access). |
|
Node
Affinity Awareness of Fast Connections |
Oracle 10g Release 1 (10.1) |
Refined term for the same RMAN
cluster optimization feature. |
1. What "Fast Connection Node Affinity
Awareness" actually does
One of the less discussed but very
practical behaviors in Oracle RAC environments is Node Affinity Awareness of
Fast Connections, especially when working with RMAN in clustered setups.
In RAC, not all nodes
always have equal I/O paths. Depending on storage architecture (ASM, SAN
zoning, local vs shared storage, etc.), some nodes may have faster or more
direct access to specific datafiles.
In a cluster database
configuration, some nodes may have faster physical or network access to
specific data files than others. Node Affinity Awareness allows RMAN to:
- Automatically detect which nodes have the
"fastest" connection to specific data files.
- Prioritize channels on those specific nodes when
backing up those files to reduce network overhead and improve performance.
1.1. What does that mean in practice?
When you allocate RMAN channels
across multiple RAC instances, RMAN can automatically prefer the instance that
has faster access to a given datafile. Instead of blindly distributing
workload, it optimizes backup operations based on storage affinity.
Why This
Matters
In real-world environments:
- Mixed storage topologies are common
- Some nodes may access LUNs via different
HBAs or fabric paths
- Backup windows are tight
- Interconnect traffic must be minimized
Without node affinity awareness, a
backup operation might read blocks from a remote node, pushing unnecessary
traffic over the cluster interconnect.
- RMAN channels tend to process datafiles
locally
- Interconnect overhead is reduced
- Backup performance becomes more
predictable
- Scalability improves in larger RAC
clusters
Example
If you allocate channels like
this:
RMAN> CONFIGURE
DEVICE TYPE DISK PARALLELISM 4;
Across a 2-node RAC, RMAN does not
just randomly assign work. It considers instance-to-datafile locality and
optimizes accordingly.
This is particularly beneficial
in:
- Large OLTP systems
- Exadata-like architectures
- Environments with asymmetric storage
paths
- Consolidated RAC clusters with heavy
archive generation
1.2. Why It’s Often Overlooked
Oracle does not present this as a
flashy headline feature. It’s more of an internal optimization introduced in 10g.
Because of that, many DBAs assume they must manually control instance affinity
in all cases.
In most modern RAC environments,
RMAN already does the smart thing; provided the storage is properly configured
and visible.
Subtle features like this are good
reminders that performance tuning in RAC is not always about adding more
channels or more nodes. Sometimes it’s about understanding how Oracle already
optimizes under the hood.
If you’re running backups in RAC
and still seeing high interconnect traffic during RMAN jobs, it’s worth
reviewing how channels are allocated and how storage is presented to each node.
More:
This feature (introduced in 10g
and enhanced around Oracle 19c–26ai) is a client-side and service-level
optimization that makes Oracle-aware components (like JDBC UCP, Oracle Call
Interface (OCI), and some internal utilities) aware of which RAC node a
session or connection should prefer.
Goal in new versions:
Keep client sessions connected to the “best” or “closest” RAC instance (for
performance and load balance) and to react fast to node or service failover
(via Fast Application Notification and ONS events).
So, it belongs to connection
management; typically for applications, middle-tier pools, or database
tools that support FAN/ONS and service-based connections.
2. What RMAN
does in RAC context
RMAN
(Recovery Manager) is Oracle’s
backup and recovery tool.
When running in RAC, RMAN can:
·
Spawn
multiple channels across RAC instances (ALLOCATE CHANNEL FOR DEVICE TYPE
DISK CONNECT='sys@inst1')
- Distribute workload across nodes for parallel backup and
restore
- Use services to connect to instances dynamically
- Be aware of the cluster configuration
via the control file and cluster registry
Example:
RUN {
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK CONNECT
'sys@rac1';
ALLOCATE CHANNEL ch2 DEVICE TYPE DISK CONNECT
'sys@rac2';
BACKUP DATABASE PLUS ARCHIVELOG;
}
Or
dynamically via a service name:
rman target sys@DB_SERVICE
When connecting via a service
name, RMAN uses the same connection infrastructure as other Oracle clients;
so, if that service has node affinity or FAN/FCF enabled, RMAN inherits
that behavior.
3. Where they
intersect conceptually
|
Area |
Fast Connection Node Affinity |
RMAN in RAC |
|
Connection management |
Maintains affinity to a RAC node (fast re-connection,
ONS-aware). |
Uses TNS services to connect to nodes; can also leverage FAN
events indirectly when connecting via a service. |
|
Awareness of nodes |
Client tracks which node instance a session belongs to. |
RMAN can allocate channels to specific instances or use
load-balancing across RAC nodes. |
|
Failover |
Automatically redirects to healthy node if one fails. |
If a node fails during backup, RMAN can retry failed channels on
surviving nodes (since 11g). |
|
Integration point |
Uses the same Oracle Net and service infrastructure (ONS, FAN). |
Uses the same infrastructure for connections and service
failover. |
So, while RMAN doesn’t use
“fast connection node affinity awareness” directly, it benefits indirectly
because it connects through Oracle Net services that may have affinity and
FAN enabled.
That means:
- If you define your RMAN target
connection via a RAC service that’s configured with preferred
instances or affinity settings, Then RMAN sessions (channels) will
honor that service topology, i.e., connect preferentially to the
“affined” node or redirect if it fails.
Channel Connections to Cluster Instances with RMAN
Channel connections to the
instances are determined using the connect string defined by channel
configurations. For example, in the following configuration, three channels are
allocated using USER/pwd@service_name. If you configure the SQL Net
service name with load balancing turned on, then the channels are allocated at
a node as decided by the load balancing algorithm.
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE CHANNEL DEVICE TYPE SBT CONNECT ‘USER/pwd@service_name’;
--this service must be load-balancing
options.
However, if the service name used
in the connect string is not for load balancing, then you can control at which
instance the channels are allocated using separate connect strings for each
channel configuration, as follows:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE CHANNEL 1.. CONNECT ‘USER/pwd@mydb_1';-- mydb1 is
instance_1
CONFIGURE CHANNEL 2.. CONNECT ‘USER/pwd@mydb_2';
CONFIGURE CHANNEL 3.. CONNECT ‘USER/pwd@mydb_3';
In the above example, it is
assumed that mydb_1, mydb_2 and mydb_3 are SQL*Net
service names that connect to pre-defined nodes in your Oracle RAC environment.
Alternatively, you can also use manually allocated channels to backup your
database files.
For example, the following command
backs up the SPFILE, control file, data files and archived redo logs:
RUN
{
ALLOCATE CHANNEL CH1 CONNECT ‘USER/pwd@mydb_1';
ALLOCATE CHANNEL CH2
CONNECT ‘USER/pwd@mydb_2';
ALLOCATE CHANNEL CH3
CONNECT ‘USER/pwd@mydb_3';
BACKUP DATABASE PLUS
ARCHIVED LOG;
}
During a backup operation, if at least one channel
allocated has access to the archived log, then RMAN automatically schedules the
backup of the specific log on that channel. Because the control file, SPFILE,
and data files are accessible by any channel, the backup operation of these
files is distributed across the allocated channels.
For a local archiving scheme,
there must be at least one channel allocated to all of the nodes that write to
their local archived logs. For a cluster file system archiving scheme, if every
node writes the archived logs in the same cluster file system, then the backup
operation of the archived logs is distributed across the allocated channels.
During a backup, the instances to
which the channels connect must be either all mounted or all open. For example,
if the instance on node1 has the database mounted while the instances
on node2 and node3 have the database open, then the backup
fails.
Note: in a RAC production environment with 3-Node, we usually have to
take backup on the specific node to lead minimize load affection and leave
other Nodes for Client and application workload.
So, these configurations may use
as a different backup policy.
Deleting Archived Redo Logs after a Successful Backup
Learn how to delete archived redo
logs after backups.
If you have configured the
automatic channels as defined in section "Channel Connections to
Cluster Instances with RMAN", then you can use the following example to
delete the archived logs that you backed up n times. The
device type can be DISK or SBT:
DELETE ARCHIVELOG ALL BACKED UP n TIMES TO DEVICE TYPE
device_type;
During a delete operation, if at
least one channel allocated has access to the archived log, then RMAN
automatically schedules the deletion of the specific log on that channel. For a
local archiving scheme, there must be at least one channel allocated that can
delete an archived log. For a cluster file system archiving scheme, if every
node writes to the archived logs on the same cluster file system, then the
archived log can be deleted by any allocated channel.
If you have not configured
automatic channels, then you can manually allocate the maintenance channels as
follows and delete the archived logs.
ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT
'SYS/oracle@node1';
ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT
'SYS/oracle@node2';
ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT
'SYS/oracle@node3';
DELETE ARCHIVELOG ALL BACKED UP n TIMES TO DEVICE TYPE
device_type;
Suppose you define a service for
RMAN operations:
srvctl add service -db MYDB -service RMAN_SVC \
-preferred "RAC1,
RAC2" -available "RAC3" -clbgoal LONG -rlbgoal THROUGHPUT
Optional enhancements (if using 26ai+)
-rlbgoal {NONE | SMART_CONN | SERVICE_TIME |
THROUGHPUT}: Runtime Load Balancing Goal (for the Load Balancing Advisory).
Setting the run-time connection
load balancing goal to NONE disables load balancing for the service, set
this parameter to SMART_CONN to enable Smart Connection
Rebalance. Set this parameter to SERVICE_TIME to balance connections
by response time. Set this parameter to THROUGHPUT to balance
connections by throughput.
Smart Connection Rebalance
Smart Connection Rebalance
automatically routes sessions to an instance with the intent to optimize
performance by monitoring the access patterns of the underlying objects of the
workload.
Oracle Real Application Clusters
(Oracle RAC) offers two options for load balancing: client-side load balancing
and server- side load balancing. Sessions connect to an Oracle RAC instance
using Single Client Access Network (SCAN) and a user-defined service name. You
can configure a service to run on all or a subset of Oracle RAC instances. By
default, SCAN redirects the sessions to the local listener and the SCAN
listener directs a connection request to the best instance currently hosting
the service, based on the -clbgoal and -rlbgoal settings
for the service.
Smart Connection Rebalance avoids
resource conflict and ensures that workloads accessing similar objects end up
in one instance and benefit from the reduced inter-instance network messages
and data block transfers over the private network. This feature ensures optimum
load balancing and performance. Oracle RAC features, such as partitioning,
local indexes, Right Growing Index (RGI) optimizations, and Exafusion help
reduce resource conflict.
You can enable Smart Connection
Rebalance by setting the -rlbgoal attribute to SMART_CONN:
$ srvctl modify service -db
db_unique_name -service service_name -rlbgoal SMART_CONN
To disable Smart connection
load balancing, set the -rlbgoal of that service
to Service_TIME.
This feature performs real-time
monitoring of different workloads and attempt to transparently relocate
service-based connections across Oracle RAC instances to significantly improve
database performance.
Note: The connection relocation is automatic and does not need database administrators to manually distribute the sessions.
Notes for RMAN usage
1. The core
concept: RMAN channels
= Oracle sessions = RAC instances
When you run RMAN in a RAC
environment with a service that spans multiple nodes (for example, RMAN_SVC
active on RAC1, RAC2, RAC3), each RMAN channel becomes a separate Oracle
session.
Each session connects through SCAN listeners to one of the nodes where the
service is running.
Example
RUN {
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK CONNECT
'sys@RMAN_SVC';
ALLOCATE CHANNEL ch2 DEVICE TYPE DISK CONNECT
'sys@RMAN_SVC';
ALLOCATE CHANNEL ch3 DEVICE TYPE DISK CONNECT
'sys@RMAN_SVC';
BACKUP DATABASE;
}
Oracle SCAN listener
automatically sends:
ch1 → RAC1
ch2 → RAC2
ch3 → RAC3
(based on service load balancing
goals and node affinity).
2. How RMAN
distributes work per channel
Once the channels are connected,
RMAN queries the control file to get the list of datafiles and their block
ranges, then divides them among channels based on:
- number of allocated channels,
- channel throughput,
- size of datafiles.
There’s no static “file 1–10 to
node 1” rule — RMAN dynamically partitions the workload.
However, it tries to balance load fairly evenly across channels (and
therefore across nodes).
You can see the mapping in the
RMAN output, each channel logs which datafile it’s backing up.
Example:
channel ch1: starting full datafile backup set
channel ch1: specifying datafile(s)
channel ch1: datafile 1, 3, 5
channel ch2: datafile 2, 4, 6
3. If you want manual control (file→instance
mapping)
You can explicitly control which
node backs up which datafiles by connecting each channel directly to a specific
instance instead of using SCAN.
Example:
RUN {
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK CONNECT
'sys@RAC1';
ALLOCATE CHANNEL ch2 DEVICE TYPE DISK CONNECT
'sys@RAC2';
ALLOCATE CHANNEL ch3 DEVICE TYPE DISK CONNECT
'sys@RAC3';
BACKUP AS COMPRESSED BACKUPSET DATABASE;
}
Now:
- ch1 only runs on RAC1
- ch2 only runs on RAC2
- ch3 only runs on RAC3
You can even manually partition datafiles:
RUN {
ALLOCATE CHANNEL
ch1 DEVICE TYPE DISK CONNECT 'sys@RAC1'; --instance1
ALLOCATE CHANNEL
ch2 DEVICE TYPE DISK CONNECT 'sys@RAC2';
ALLOCATE CHANNEL
ch3 DEVICE TYPE DISK CONNECT 'sys@RAC3';
BACKUP
(DATAFILE 1,3,5,7 CHANNEL
ch1)
(DATAFILE 2,4,6 SECTION SIZE 100M CHANNEL ch2)
(ARCHIVELOG FROM SEQUENCE
100 UNTIL SEQUENCE 102 THREAD 1 CHANNEL ch3);
(ARCHIVELOG FROM SEQUENCE
100 UNTIL SEQUENCE 102 THREAD 1 CHANNEL ch3);
(CURRENT
CONTROLFILE TAG 'DBSTREP_CTRLFILE' CHANNEL ch2)
(SPFILE
TAG 'DBSTREP_SPFILE' channel c2);
}
This is the classic pattern
used in large RAC environments (Exadata, etc.) when you want full control of
which instance handles which portion of the database.
4. If you
prefer automatic distribution (recommended)
Instead of manual mapping, you can
just rely on:
RUN {
ALLOCATE CHANNEL
FOR DEVICE TYPE DISK PARALLELISM 6 CONNECT 'sys@RMAN_SVC';
BACKUP DATABASE;}
Oracle’s service load balancing + Fast
Connection Node Affinity (from 26ai onwards) ensures:
- Channels get spread evenly across nodes.
- Each channel stays “affinitized” to one
node for the backup session.
- Performance adapts dynamically.
This is simpler and still efficient;
especially if your goal is balanced throughput, not strict file mapping.
Determining Channel Parallelism to Match Hardware
Devices
RMAN can perform the I/O required
for many commands in parallel, to make optimal use of your hardware resources.
To perform I/O in parallel, however, the I/O must be associated with a single
RMAN command, not a series of commands. For example, it can be more efficient
to back up three datafiles using a command such as:
BACKUP DATAFILE 5,6,7;
rather than issuing the commands
BACKUP DATAFILE 5;
BACKUP DATAFILE 6;
BACKUP DATAFILE 7;
When all three datafiles are
backed up in one command, RMAN recognizes the opportunity for parallelism and
can use multiple channels to do the I/O in parallel. When three separate
commands are used, RMAN can only perform the backups one at a time, regardless
of available channels and I/O devices.
The number of channels available
(whether allocated in a RUN block or configured in advance) for use with a
device at the moment that you run a command determines whether RMAN will read
from or write to that device in parallel while carrying out the command.
Failing to allocate the right number of channels adversely affects RMAN
performance during I/O operations.
As a rule, the number of channels
used in carrying out an individual RMAN command should match the number of
physical devices accessed in carrying out that command. If manually allocating
channels for a command, allocate one for each device; if configuring automatic
channels, configure the PARALLELISM setting appropriately.
When backing up to tape, you
should allocate one channel for each tape drive. When backing up to disk,
allocate one channel for each physical disk, unless you can optimize the backup
for your disk topography by using multiple disk channels. Each manually
allocated channel uses a separate connection to the target or auxiliary
database.
The following script creates three
backups sequentially: three separate BACKUP commands are used to back
up one file each. Only one channel is active at any one time because only one
file is being backed up in each command.
RUN
{
ALLOCATE CHANNEL
c1 DEVICE TYPE sbt;
ALLOCATE CHANNEL
c2 DEVICE TYPE sbt;
ALLOCATE CHANNEL
c3 DEVICE TYPE sbt;
BACKUP DATAFILE
5;
BACKUP DATAFILE
6;
BACKUP DATAFILE
7;
}
The following statement uses parallelization on the same example:
one RMAN BACKUP command backs up three datafiles, with all three
channels in use. The three channels are concurrently active; each
server session copies one of the datafiles to a separate tape drive.
RUN
{
ALLOCATE CHANNEL
c1 DEVICE TYPE sbt;
ALLOCATE CHANNEL
c2 DEVICE TYPE sbt;
ALLOCATE CHANNEL
c3 DEVICE TYPE sbt;
BACKUP DATAFILE
5,6,7;}
Summary
|
Aspect |
Description |
|
Direct dependency |
None — RMAN does not directly implement
“fast connection node affinity awareness.” |
|
Indirect relationship |
Yes — through RAC service
definitions and FAN/ONS infrastructure that both RMAN and application clients
use. |
|
Benefit to RMAN |
Faster reconnection or
service relocation when a node fails during backups, if RMAN connects via a
FAN/affinity-enabled service. |
|
Versions |
The connection affinity awareness is built into Oracle Client /
OCI starting around 19c–26ai; RMAN benefits when using these clients
and services. |
Conclusion:
RMAN itself can
use “fast connection node affinity awareness.” when RMAN connects via a RAC service that has affinity and
FAN/ONS enabled, it benefits from the same infrastructure and RMAN use node
affinity value and use this value to connect to faster node, or handle node failover
based on Service characteristics, service redirection, and balanced channel
allocation.