Monday, February 9, 2026

NVMe Over Fabrics (NVMe-oF) with Oracle ASM – Oracle AI Database 26ai

 NVMe Over Fabrics (NVMe-oF) with Oracle ASM – Oracle AI Database 26ai

With Oracle AI Database 26ai, Oracle introduces native support for NVMe over Fabrics (NVMe-oF) over TCP/IP, enabling Oracle ASM to use remote NVMe storage with near-local performance.

This is an important evolution in Oracle storage architecture — and a key point to highlight:

The network fabric is Ethernet (TCP/IP), NOT Fibre Channel.



The Oracle Grid Infrastructure server works as an initiator that connects to an NVMe-oF storage target created using Linux Kernel nvmet_tcp module.

You can use NVMe-oF storage devices to create Oracle ASM disk groups. These disk groups can store Oracle AI Database data files. 

This configuration extends Oracle ASM capabilities to the NVMe-oF storage devices. 

Direct access to NVMe-oF storage targets from Oracle AI Database servers offers lower latency and greater throughput for I/O operations.

Oracle Grid Infrastructure 26ai works only with NVMe of Fabrics storage targets with Linux kernel 5.4 or later versions. 

Use an Ethernet Network Interface Card (NIC) on your Oracle Grid Infrastructure server to connect to an NVMe-oF storage target. 

Oracle recommends that you use a 25 GbE or higher Ethernet NIC to minimize latency.

Note:
The performance of databases stored on the NVMe-oF devices depends on the performance of the network connection between the Oracle Grid Infrastructure servers and the NVMe-oF storage targets. 

For optimal performance, Oracle recommends that you connect the Oracle Grid Infrastructure servers to the NVMe-oF storage targets using private dedicated network connections.

Architecture Overview

  • Oracle Grid Infrastructure 26ai acts as the NVMe-oF initiator

  • Connects to NVMe-oF targets created using the Linux kernel nvmet_tcp module

  • Communication happens over standard Ethernet using TCP/IP

  • Oracle ASM disk groups can be created directly on NVMe-oF devices

  • These disk groups store Oracle AI Database data files

Key Capabilities

  • Direct user-space NVMe-oF access from Oracle processes

  • Low latency & high throughput compared to traditional network storage

  • Extends ASM beyond local disks to remote NVMe storage

  • No FC HBAs, no FC switches, no SAN zoning

Technical Requirements

  • Oracle Grid Infrastructure 26ai

  • Linux kernel 5.4 or later

  • Ethernet-based NVMe-oF over TCP

  • Recommended 25 GbE or higher NIC

  • Dedicated network fabric for storage traffic

Important Note on Networking

Although NVMe-oF is a “fabric” technology:

  •  It is not Fibre Channel

  •  It does not use NVMe-FC

  •  It uses Ethernet + TCP/IP

Performance depends heavily on network quality.
For optimal results, Oracle recommends:

  • Private, dedicated Ethernet connections

  • Low-latency switching

  • No congestion or oversubscription

When properly designed, NVMe-oF over TCP + Oracle ASM delivers scalable, high-performance storage without the complexity of Fibre Channel — ideal for modern Oracle AI workloads.


 Do you need specific RPMs on Oracle Linux?

No extra or special RPMs are required beyond what comes with:

  • Oracle Linux 8/9

  • Oracle Grid Infrastructure 26ai

Why?

  • Oracle uses its own native user-space NVMe-oF initiator

  • The initiator is embedded in Grid Infrastructure 26ai

  • You do NOT use:

    • nvme-cli

    • nvme-tcp kernel initiator

    • Any external NVMe user tools

What is required on the OS?

  • Linux kernel 5.4+ (already standard on OL8/OL9)

  • Standard networking stack

  • Ethernet NIC drivers (25 GbE+ recommended)

 The NVMe-oF target (storage side) does require:

  • nvmet

  • nvmet-tcp kernel modules
    (but that’s on the storage server, not the Oracle DB server)

 Is there any new configuration at the ASM layer?

No new ASM concepts, no new disk types

From an ASM point of view:

  • NVMe-oF disks look like regular ASM disks

  • Disk groups are created exactly the same way

  • ASM redundancy, rebalance, failure groups → unchanged

What does change behind the scenes?

  • The path to the storage

  • ASM disks are backed by remote NVMe-oF namespaces

  • Access is managed internally by Grid Infrastructure

Typical ASM workflow (unchanged)

asmcmd afd_state asmcmd afd_label NVME01 /dev/oracle_nvme0
CREATE DISKGROUP DATA_NVME EXTERNAL REDUNDANCY DISK 'AFD:NVME01';

 No new ASM parameters
 No ASM patches
 No ASM driver changes

 What is new operationally?

 Grid Infrastructure layer

  • GI handles:

    • NVMe-oF session lifecycle

    • Namespace discovery

    • Error handling and reconnects

 Network becomes critical

  • Latency-sensitive

  • Requires:

    • Dedicated Ethernet fabric

    • Stable MTU (jumbo frames if supported)

    • No packet loss

#OracleDatabase #Oracle26ai #NVMeoF #OracleASM #GridInfrastructure #Ethernet #TCPIP #HighPerformanceStorage #DatabaseArchitecture

Sunday, February 1, 2026

Why the Same SQL Is Slower on Standby Than Primary؟

Are you having trouble with a specific Query in standby Database instead of the Primary response time?

 

Introduction:  Why the Same SQL Is Slower on Standby Than Primary

When a SQL query runs slower on an Active Data Guard standby than on the primary database, the first instinct is often to blame bad SQL, missing indexes, or insufficient resources. Oracle explicitly warns that this assumption is often wrong.

“A particular query response time or throughput is not guaranteed to be the same on the standby system as it was on the primary.”

This is not a legal disclaimer or a vague recommendation—it is a direct consequence of how Active Data Guard enforces read consistency while redo apply is continuously running.

On a primary database, queries read data that is locally consistent using undo that is immediately available. On a standby database, however, every query must ensure that the data it sees is consistent with a specific System Commit Number (SCN), and that SCN must already be processed by redo apply. When redo apply lags—even slightly—queries may wait, sometimes turning very short SQL statements into unexpectedly slow operations.

Understanding this difference is essential before attempting any tuning. In many cases, the query itself is perfectly efficient—the delay is caused by standby consistency mechanics, not poor SQL design.

 

What if a specific query does not meet your requirements?

Consult with a performance engineer and follow the recommendations in Database Performance Tuning Guide. A particular query response time or throughput or report elapsed time is not guaranteed to be the same on the standby system as it was on the primary. Analyze the system resources, SQL plans, overall CPU work time and wait times. For example, you may see standby query SCN advance wait is contributing to a much longer elapsed time in one of your short queries. This wait increase is attributed to Active Data Guard redo apply. If a query sees a certain row in a data block and needs to roll it back because the transaction has not committed as of the query System Commit Number (SCN), it needs to apply corresponding undo to get a consistent read for that query. If the redo for the corresponding undo change has not been applied by redo apply yet, the query needs to wait. The presence of such wait is itself not an issue, and typically may be a couple of milliseconds, but it will vary by workload and may be higher in Real Application Cluster database systems.

 I’ll break this into 5 layers:

  1. What Oracle is warning you about (conceptual meaning)
  2. Why standby query performance is inherently different from primary
  3. Deep dive into standby query SCN advance wait
  4. How to diagnose a “bad” query on standby
  5. What you can do about it (tuning & design actions)

 What Oracle is really saying

"A particular query response time or throughput is not guaranteed to be the same on standby as on primary."

This is not a generic disclaimer; it’s a hard architectural reality:

Primary

Active Data Guard Standby

Foreground sessions modify data

Foreground sessions only read

No redo apply

Continuous redo apply

No SCN synchronization waits

SCN must be consistent with redo apply

Undo is always locally present

Undo may depend on redo being applied

 

 On standby, your query is competing with redo apply and consistency mechanisms, not just CPU/IO.

So, Oracle is telling you:

If a query is slower on standby, do not assume it is SQL inefficiency, first analyze consistency waits and redo behavior.

 

 Why standby queries behave differently

On Primary:

A consistent read only needs:

  • Local undo
  • Current buffer cache
  • No coordination with redo

 

On Standby (Active Data Guard):

A query must:

  1. Determine its query SCN.
  2. Ensure the block version it reads is consistent as of that SCN.
  3. If the block contains changes from transactions not committed at that SCN:
    • Roll them back using undo.
  4. But that undo itself may not yet be available until redo apply processes it.

So, a query may wait for redo apply to catch up enough to make the undo visible.

That is where this wait comes from:

standby query scn advance

 Deep dive: standby query scn advance

What exactly does this wait mean?

It means:

“My query SCN is ahead of what redo apply has processed, so I must wait for redo apply to advance.”

This usually happens when:

  • Redo generation is high on primary
  • Redo transport or apply is slightly lagging
  • Query needs a very recent consistent version

 

Timeline illustration

Primary SCN:        1000 ---- 1010 ---- 1020

Redo applied:       990 ---- 995 ---- 1002

Query wants SCN:                    1015

standby query scn advance wait

 

 

Your query cannot see SCN 1015 yet, because redo apply hasn’t reached that point.

 

Why rollback (undo) makes it worse

If your query reads a block that contains changes from an uncommitted transaction:

  • It must rollback using undo
  • But that undo record itself is in redo
  • If redo apply hasn’t applied that undo yet → wait

Thus:

A short query can become slow if it hits blocks with recent changes.

 

How to diagnose a query that doesn’t meet requirements

 

Step 1: Confirm it is a standby consistency wait

SELECT event, total_waits, time_waited_micro

FROM   v$system_event WHERE  event LIKE 'standby%';

Or session level:

SELECT sid, event, seconds_in_wait FROM  v$session

WHERE  event = 'standby query scn advance';

 

Step 2 — Check apply lag

SELECT name, value, unit

FROM   v$dataguard_stats WHERE  name IN ('apply lag','transport lag');

Even a few seconds of lag can affect “fresh data” queries.

 

 

Step 3 — Check redo rate vs apply capacity

SELECT * FROM v$recovery_progress;

SELECT * FROM v$managed_standby;

Look for:

  • MRP lagging
  • Single-threaded apply in high redo environments
  • IO bottlenecks on standby redo logs or datafiles

 

Step 4 — Identify affected SQL

SELECT sql_id, elapsed_time, executions

FROM   v$sql ORDER BY elapsed_time DESC;

Cross-check whether the same SQL is fast on primary but slow on standby.

 

What you can do about it

A) Reduce redo apply pressure

  • Use larger Standby Redo Logs
  • Ensure fast storage for redo and undo
  • Enable parallel redo apply if needed:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE PARALLEL 8;

B) Route queries intelligently

Not all queries should go to standby:

Query Type

Good for Standby?

Historical reporting

Yes

Fresh transactional data

 Maybe

“Just committed” lookups

 Avoid

Operational dashboards

Depends

 

C) Use Data Guard service classes

Create services with different lag tolerance:

srvctl add service -d DB -s RO_FAST -l PRIMARY

srvctl add service -d DB -s RO_LAG_TOLERANT -l PHYSICAL_STANDBY

 

D) Tune the SQL for fewer “hot blocks”

Queries that repeatedly hit frequently modified blocks suffer more.

Example:

  • Index on monotonically increasing columns
  • Hot segments (queues, logs, audit tables)

Mitigations:

  • Partitioning table/ index/PK
  • Reverse key indexes
  • Move hot objects off standby usage

 

 Summary

Point

Meaning

Standby is not just a replica

It enforces SCN consistency with redo

standby query SCN advance

Query waits for redo apply to reach needed SCN

Short queries can be slow

If they hit recently modified blocks

Not a bug

It’s fundamental to Active Data Guard

Solution

Tune apply, route queries, redesign hot access

 

 

 

Key points:

Active Data Guard gives you correct data, not necessarily fast data for every workload.
If you need low latency on fresh data, standby is the wrong place.
If you need offload of stable, historical, or analytical queries, standby is ideal.

Although there are some another parameters and techniques to improve Standby.

In the next post, I will speak about how to achieve to a minimum synchronized standby database.

**************************

Alireza Kamrani

RDBMS/NoSQL Solution Architecture

Saturday, January 31, 2026

Support for Deprioritization of Database Nodes in RAC with JDBC

Support for Deprioritization of Database Nodes in RAC 26ai with JDBC features  

Managing Node failures in RAC environment is one of important issues that can handle by some solutions, for example by creating Services by SRVCTL command and Scan Listerner.


In this post, I introducing a new solution that can manage these failures by JDBC features in the Oracle RAC.


Starting from Oracle Database 12c Release 2 (12.2.0.1), JDBC drivers support deprioritization of database nodes. When a node fails, JDBC deprioritizes it for the next 10 minutes, which is the default expiry time. For example, if there are three nodes A, B, C, and node A is down, then connections are allocated first from nodes B and C, and then from node A. After the default expiry time, node A is no longer deprioritized, that is, connections are allocated from all the three nodes on availability basis. Also, during the default expiry time, if a connection attempt to node A succeeds, then node A is no longer considered to be a deprioritized node. You can specify the default expiry time for deprioritization using the oracle.net.DOWN_HOSTS_TIMEOUT system property.


For example, in the following URL, scan_listener0 has ip1, ip2, and ip3 IP addresses configured, after retrieving its IP addresses. So application can connect to database instances by a round robin format:

IP1-->IP2-->IP3

Now, if ip1 is deprioritized, then the order of trying IP addresses will be ip2, ip3, and then ip1. 

Ip2-->ip3-->ip1(node 1 is deprioritized)


If all IP addresses are unavailable, then the whole host is tried last, after trying node_1 and node_2.


(DESCRIPTION_LIST=  

    (DESCRIPTION=

        (ADDRESS_LIST=

            (ADDRESS=(PROTOCOL=tcp)(HOST=scan_listener0)(PORT=1521))

            (ADDRESS=(PROTOCOL=tcp)(HOST=node_1)(PORT=1528))              

            (ADDRESS=(PROTOCOL=sdp)(HOST=node_2)(PORT=1527))

        )

        (ADDRESS_LIST=

            (ADDRESS=(PROTOCOL=tcp)(HOST=node_3)(PORT=1528))

        )              

        (CONNECT_DATA=(SERVICE_NAME=cdb3))

    )

    (DESCRIPTION=

        (ADDRESS=(PROTOCOL=tcp)(HOST=node_0)(PORT=1528))

        (CONNECT_DATA=(SERVICE_NAME=cdb3))

    )

)


Node_0=IP1

Node_1=IP2

Node_2=IP3


What is SDP Protocol:


Oracle Net Services provides support for the Sockets Direct Protocol (SDP) for InfiniBand high-speed networks. 

SDP is a standard communication protocol for clustered server environments. SDP is an interface between a network interface card and the application. By using SDP, applications place most of the messaging burden upon the network interface card, freeing the CPU for other tasks. As a result, SDP decreases network latency and CPU utilization. 

SDP is designed specifically for System Area Networks (SANs). A SAN is characterized by short-distance, high-performance communications between multiple server systems, such as Oracle WebLogic Server or any other third-party middle-tier client and database servers clustered on one switch.


The Sockets Direct Protocol (SDP) is an industry-standard wire protocol between InfiniBand network peers. When used over an InfiniBand network, SDP reduces TCP/IP overhead by eliminating intermediate replication of data and transferring most of the messaging burden away from the CPU and onto the network hardware.


https://docs.oracle.com/en/database/oracle/oracle-database/26/jjdbc/JDBC-getting-started.html#GUID-D5BB4E1F-59FD-4C5E-876C-71420F2DAE9B

NVMe Over Fabrics (NVMe-oF) with Oracle ASM – Oracle AI Database 26ai

  NVMe Over Fabrics (NVMe-oF) with Oracle ASM – Oracle AI Database 26ai With Oracle AI Database 26ai , Oracle introduces native support for...