Sunday, June 28, 2026

Creating Efficient RAC Data Objects: Reducing Cache Fusion Overhead and Improving Scalability

Creating Efficient RAC Data Objects: Reducing Cache Fusion Overhead and Improving Scalability

Oracle Real Application Clusters (RAC) provides high availability and horizontal scalability by allowing multiple database instances to access the same database simultaneously. While Oracle RAC's Cache Fusion architecture dramatically reduces disk I/O by transferring blocks directly between instance buffer caches, poorly designed database objects can still create excessive inter-instance traffic and become a significant scalability bottleneck.

One of the most overlooked aspects of RAC performance tuning is the design of database objects themselves. Efficient RAC object design minimizes block contention, reduces global cache transfers, and allows workloads to scale across nodes with minimal synchronization overhead.

Why Object Design Matters in RAC

In a single-instance database, multiple sessions accessing the same block generally remain local to the buffer cache. In RAC, however, if multiple instances require ownership of the same block, Oracle must coordinate access through the Global Cache Service (GCS) and Global Enqueue Service (GES), transferring blocks across the interconnect using Cache Fusion.

Although Cache Fusion is highly optimized, unnecessary block transfers can still increase:

  • Global Cache (GC) waits
  • Interconnect traffic
  • Transaction latency
  • CPU consumption
  • Application response times

The goal of RAC-aware object design is simple:

Ø  Reduce the probability that multiple RAC instances need the same block at the same time.

1. Reduce Rows Per Block (RPB)

One classic RAC optimization is reducing the number of rows stored in a data block.

When a block contains many rows, multiple RAC instances are more likely to require data from that same block. If different nodes frequently update rows within the same block, Oracle must continuously transfer ownership of that block between instances.

Several techniques can reduce Rows Per Block (RPB):

Increase PCTFREE

A higher PCTFREE reserves additional space in each block, resulting in fewer rows per block.

SQL> CREATE TABLE orders (

    order_id NUMBER,

    customer_id NUMBER,

    order_date DATE

)

PCTFREE 30;

 

Benefits:

  • Fewer rows per block
  • Reduced probability of block sharing
  • Lower cache fusion activity

Trade-off:

  • Increased storage consumption

Use Smaller Block Sizes

For highly contended objects, smaller block sizes can reduce the number of rows stored per block and help isolate workload access patterns.

Ø  This technique should be used selectively after testing because it can increase total I/O requirements.

Deliberately Increase Row Size

Historically, some RAC implementations used filler columns to increase row length and reduce block density.

SQL> ALTER TABLE orders ADD rac_padding CHAR(50);

While generally not recommended in modern systems due to storage inefficiency, the concept highlights an important RAC principle:

Reducing block sharing is often more valuable than maximizing storage efficiency.

2. Minimize Index Contention

Indexes are often the largest source of RAC block transfers.

Data blocks may be distributed across the cluster, but index leaf blocks can become hotspots when multiple instances insert or update keys concurrently.

The original RAC design guidance highlights index maintenance as a major source of inter-instance block transfers.

Use Only Necessary Indexes

Every additional index increase:

  • DML overhead
  • Redo generation
  • Cache Fusion traffic

Before creating an index, verify that it provides measurable query benefit.

3. Consider Reverse Key Indexes

Sequential keys generated by sequences often create "right-hand index growth."

All inserts target the same rightmost leaf block, creating a RAC hotspot.

Oracle provides Reverse Key Indexes specifically to address this issue.

CREATE INDEX orders_pk_idx

ON orders(order_id)

REVERSE;

A reverse key index reverses the bytes of the index key, spreading inserts across many leaf blocks instead of concentrating them on the right-hand side of the B-tree. This significantly reduces leaf block contention in RAC environments.

When to Use Reverse Key Indexes

Good candidates:

  • Sequence-generated primary keys
  • High-volume OLTP insert workloads
  • Multi-node RAC systems

Avoid when:

  • Frequent range scans are required

Because reverse key indexes scatter values, range scans become ineffective.

4. Use Locally Managed Tablespaces

Dictionary-managed tablespaces require centralized extent management.

In RAC, extent allocation and deallocation may require inter-instance coordination, increasing contention.

Locally Managed Tablespaces (LMTs) eliminate much of this overhead.

SQL> CREATE TABLESPACE oltp_data DATAFILE '+DATA' SIZE 50G EXTENT MANAGEMENT LOCAL AUTOALLOCATE;

Benefits include:

  • Reduced data dictionary contention
  • Lower extent allocation overhead
  • Better RAC scalability

Oracle has recommended LMTs as a best practice for many releases because they significantly reduce coordination costs.

5. Automate Space Management

Automatic Segment Space Management (ASSM) should be preferred over manual freelist management in most modern RAC deployments.

ASSM automatically distributes inserts across blocks, reducing freelist contention.

CREATE TABLESPACE oltp_data DATAFILE '+DATA' SIZE 50G SEGMENT SPACE MANAGEMENT AUTO;

Benefits:

  • Reduced block contention
  • Simplified administration
  • Better scalability for concurrent DML

6. Pre-Allocate Extents for Heavy Insert Objects

For tables receiving heavy concurrent inserts from multiple RAC nodes, dynamic extent allocation can become a bottleneck.

Pre-allocating extents reduces runtime space management operations.

SQL> ALTER TABLE sales ALLOCATE EXTENT (  SIZE 100M );

Historically, Oracle RAC environments also allowed extent affinity strategies that helped isolate allocation activity to specific instances.

7. Use Sequences Correctly

Sequence design is one of the most important RAC object considerations.

Ø  Avoid ORDERed Sequences

ORDERed sequences force global coordination among instances.

SQL> CREATE SEQUENCE sales_seq NOCACHE ORDER;

This configuration is usually disastrous for RAC scalability.

Prefer Cached Sequences

Oracle recommends cached sequences because they reduce synchronization overhead.

SQL> CREATE SEQUENCE sales_seq CACHE 1000 NOORDER;

Benefits:

  • Fewer data dictionary accesses
  • Reduced row cache lock waits
  • Better scalability under load

Oracle RAC documentation consistently recommends minimizing sequence synchronization and leveraging caching whenever possible.

 

Modern Alternative: Scalable Sequences

For Oracle 18c and newer, scalable sequences can often replace older RAC techniques such as reverse key indexes, providing improved scalability for sequence-generated keys.

8. Optimize Clustering Factor

The clustering factor measures how closely table rows align with index ordering.

A poor clustering factor causes:

  • More logical I/O
  • More block visits
  • Greater likelihood of inter-instance block transfers

Ø  In RAC systems, clustering factor optimization becomes even more important because each additional block access can potentially involve Cache Fusion.

Strategies include:

  • Reviewing column order in composite indexes
  • Reorganizing tables where appropriate
  • Evaluating access patterns before index creation

9. Separate Objects by Access Pattern

Tablespaces should be organized according to workload characteristics.

Recommended separation:

Frequently Modified Objects

  • High DML tables
  • OLTP indexes

Mostly Read-Only Objects

  • Reference tables
  • Lookup tables

Reporting Objects

  • Data warehouse structures
  • Historical data

This organization improves Oracle's Dynamic Resource Mastering, allowing cache resources to remain local to the instances that access them most frequently.

Practical RAC Design Principles

When designing database objects for RAC, focus on these priorities:

1.        Reduce block sharing between instances.

2.        Minimize index hot blocks.

3.        Use cached, non-ordered sequences.

4.        Favor locally managed tablespaces.

5.        Use ASSM for segment management.

6.        Pre-allocate space for heavy insert workloads.

7.        Evaluate reverse key indexes for sequential keys.

8.        Design tablespaces around workload affinity.

9.        Monitor Global Cache waits and tune the objects generating them.

Conclusion

Efficient RAC object design is fundamentally about reducing Cache Fusion traffic. Oracle RAC can automatically manage block transfers and dynamically allocate resources, but application and object design still determine how much inter-instance communication occurs.

The most successful RAC implementations are not necessarily those with the fastest hardware; they are the environments where data structures are designed to keep workload locality high and shared block access low.

By reducing rows per block, eliminating index hotspots, using cached sequences, leveraging locally managed tablespaces, and carefully organizing objects according to access patterns, DBAs can significantly improve RAC scalability while reducing global cache overhead.

In RAC, storage efficiency is often secondary to scalability efficiency. A few extra gigabytes of space are usually far cheaper than thousands of unnecessary Cache Fusion block transfers per second.

 

********************
Alireza Kamrani

Oracle ACE Pro / DATA Tech Lead


Sunday, June 21, 2026

Oracle Redo Log Configuration and I/O design

Oracle Redo Log Configuration and I/O design


In this topic, I will explore best practices for defining and designing the Oracle redo log architecture in an operational environment to achieve better I/O performance and availability.

I/O Configuration and Design

The I/O subsystem is a vital component of an Oracle database. This chapter introduces fundamental I/O concepts, discusses the I/O requirements of different parts of the database, and provides sample configurations for I/O subsystem design.

 

About I/O

Every Oracle AI Database reads or writes data on disk, thus generating disk I/O. The performance of many software applications is inherently limited by disk I/O. Applications that spend majority of their CPU time waiting for I/O activity to complete are said to be I/O-bound.

Oracle AI Database is designed so that if an application is well written, its performance should not be limited by I/O. Tuning I/O can enhance the performance of the application if the I/O system is operating at or near capacity and is not able to service the I/O requests within an acceptable time. However, tuning I/O cannot help performance if the application is not I/O-bound (for example, when CPU is the limiting factor).

Consider the following database requirements when designing an I/O system:

  • Storage, such as minimum disk capacity
  • Availability, such as continuous (24 x 7) or business hours only
  • Performance, such as I/O throughput and application response times

Many I/O designs plan for storage and availability requirements with the assumption that performance will not be an issue. This is not always the case. Optimally, the number of disks and controllers to be configured should be determined by I/O throughput and redundancy requirements. The size of disks can then be determined by the storage requirements.

When developing an I/O design plan, consider using Oracle Automatic Storage Management (Oracle ASM). Oracle ASM is an integrated, high-performance database file system and disk manager that is based on the principle that the database should manage storage instead of requiring an administrator to do it.

Oracle recommends that you use Oracle ASM for your database file storage, instead of raw devices or the operating system file system. Oracle ASM provides the following key benefits:

 

  • Striping
  • Mirroring
  • Online storage reconfiguration and dynamic rebalancing
  • Managed file creation and deletion

1. Deep understanding — why alternate-disk redo design matters

In Oracle Database, redo logging involves two main processes:

Process

Function

I/O Pattern

Critical Concern

LGWR (Log Writer)

Writes redo entries to current redo log members

Sequential, low-latency writes

Latency directly affects commit performance

ARCH (Archiver)

Reads full redo log groups after a log switch and writes archived redo logs to archive destinations

Sequential read + write

Can stall LGWR if it falls behind

 

Contention point

If both LGWR and ARCH operate on the same physical disks, I/O contention occurs:

  • ARCH reads full redo logs while LGWR tries to write to the next one.
  • ARCH → sequential reads
  • LGWR → sequential writes
    → competing I/O queue = slower commit response time and possible "log file switch (archiving needed)" waits.

2. Alternating-disk architecture (revised grouping)

Goal: separate LGWR writes from ARCH reads by alternating redo-members placement across physical disks.

 New grouping (per your request):

Archived Redo Logs

If the archiver is slow, then it might be prudent to prevent I/O contention between the archiver process and LGWR by ensuring that archiver reads and LGWR writes are separated. This is achieved by placing logs on alternating drives.

For example, suppose a system has four redo log groups, each group with two members. To create separate-disk access, the eight log files should be labeled 1a, 1b, 2a, 2b, 3a, 3b, 4a, and 4b. This requires at least four disks, plus one disk for archived files.

The following figure illustrates how redo members should be distributed across disks to minimize contention.

Redo Group

Member A

Member B

Group 1

Disk 1a

Disk 3a

Group 2

Disk 2a

Disk 4a

Group 3

Disk 1b

Disk 3b

Group 4

Disk 2b

Disk 4b

 

 

 

 

 

 

In this example, LGWR switches out of log group 1 (member 1a and 1b) and writes to log group 2 (2a and 2b). Concurrently, the archiver process reads from group 1 and writes to its archive destination. Note how the redo log files are isolated from contention.

Note:

Mirroring redo log files, or maintaining multiple copies of each redo log file on separate disks, does not slow LGWR considerably. LGWR writes to each disk in parallel and waits until each part of the parallel write is complete. Thus, a parallel write does not take longer than the longest possible single-disk write.

Because redo logs are written serially, drives dedicated to redo log activity generally require limited head movement. This significantly accelerates log writing.

Why this helps:
When logs rotate, ARCH will be reading older groups that are placed on different spindles/paths than LGWR’s current target. The alternating pattern reduces read/write head / controller contention and improves commit latency.

  

ASM Implementation Example

Assumptions

  • Linux with oracleasm kernel module installed/configured (Oracle ASMLib).
  • Devices available: /dev/sdb1 … /dev/sdi1 (adjust to your actual device names).
  • Commands that operate on ASM (diskgroup creation) run as the Grid Infrastructure owner (e.g. grid), or use sqlplus / as sysasm for CREATE DISKGROUP.
  • Database SYSDBA commands run on the database instance.

  

Step 0 — Prepare environment variables (example)

# as grid user (adjust ORACLE_HOME and ASM SID)

export ORACLE_HOME=/u01/app/grid

export PATH=$ORACLE_HOME/bin:$PATH

export ORACLE_SID=+ASM

 

Step 1 — Create ASM device entries with oracleasm

Use oracleasm createdisk to expose devices to ASM (ASMLib). Replace /dev/sdb1 → your device path.

# as root (or user with permission to run oracleasm)

# Create 8 ASM device identifiers mapping to physical devices

/usr/sbin/oracleasm createdisk DISK_1A   /dev/sdb1

/usr/sbin/oracleasm createdisk DISK_1B   /dev/sdc1

/usr/sbin/oracleasm createdisk DISK_2A   /dev/sdd1

/usr/sbin/oracleasm createdisk DISK_2B   /dev/sde1

/usr/sbin/oracleasm createdisk DISK_3A   /dev/sdf1

/usr/sbin/oracleasm createdisk DISK_3B   /dev/sdg1

/usr/sbin/oracleasm createdisk DISK_4A   /dev/sdh1

/usr/sbin/oracleasm createdisk DISK_4B   /dev/sdi1

 

 Verify (example):

/usr/sbin/oracleasm listdisks

# or

/usr/sbin/oracleasm querydisk -p DISK_1A   # prints device path mapped

 

Step 2 — Create ASM diskgroups (example)

Two common approaches:

A. Create two diskgroups +REDO_A and +REDO_B grouping A-side and B-side disks
This keeps 1a/2a/3a/4a in one diskgroup and 1b/2b/3b/4b in another.

-- connect as SYSASM

sqlplus / as sysasm

 

CREATE DISKGROUP REDO_A

EXTERNAL REDUNDANCY

  DISK 'ORCL:DISK_1A' NAME redo_1a,

       'ORCL:DISK_2A' NAME redo_2a,

       'ORCL:DISK_3A' NAME redo_3a,

       'ORCL:DISK_4A' NAME redo_4a

  ATTRIBUTE 'au_size'='1048576';

 

CREATE DISKGROUP REDO_B

 EXTERNAL REDUNDANCY

  DISK 'ORCL:DISK_1B' NAME redo_1b,

       'ORCL:DISK_2B' NAME redo_2b,

       'ORCL:DISK_3B' NAME redo_3b,

       'ORCL:DISK_4B' NAME redo_4b

  ATTRIBUTE 'au_size'='1048576';

 

 

Step 3 — Create redo log groups using your requested pairings

You asked for putting 1a & 3a into a redo group, 2a & 4a together, and similarly for the B-side. Following that layout we create 4 groups:

-- connect to the database as SYSDBA

sqlplus / as sysdba

 

-- Group 1: members on disks 1a and 3a

ALTER DATABASE ADD LOGFILE GROUP 1

  ('+REDO_A/redo_group1_1a.log', '+REDO_A/redo_group1_3a.log') SIZE 512M;

 

-- Group 2: members on disks 2a and 4a

ALTER DATABASE ADD LOGFILE GROUP 2

  ('+REDO_A/redo_group2_2a.log', '+REDO_A/redo_group2_4a.log') SIZE 512M;

 

-- Group 3: members on disks 1b and 3b

ALTER DATABASE ADD LOGFILE GROUP 3

  ('+REDO_B/redo_group3_1b.log', '+REDO_B/redo_group3_3b.log') SIZE 512M;

 

-- Group 4: members on disks 2b and 4b

ALTER DATABASE ADD LOGFILE GROUP 4

  ('+REDO_B/redo_group4_2b.log', '+REDO_B/redo_group4_4b.log') SIZE 512M;

 

Important notes:

  • This places the two members of each redo group on disks that you requested (e.g., group 1 uses 1a & 3a).
  • Typically, multiplexing places members on different disk groups or failure groups; here we followed your requested mapping. If you want members to be on different physical controllers, ensure the underlying disks (1a vs 3a) are on different controllers/enclosures or assign them to different ASM failure groups when creating the diskgroup.
  • If you prefer each member to be in a different diskgroup (for stronger isolation), create groups with ('+REDO_A/…', '+REDO_B/…') pairs. Let me know if you want that variant.

 

Step 4 — Separate archive destination (recommended)

Put archived logs on a distinct ASM diskgroup (or filesystem) to eliminate ARCH → LGWR contention:

-- create +ARCH_DG (one example diskgroup)

-- then:

ALTER SYSTEM SET log_archive_dest_1='LOCATION=+ARCH_DG/arch' SCOPE=BOTH;

 

Validation / verification

 

 

Check redo members in database:

SET PAGESIZE 200

COLUMN MEMBER FORMAT A60

SELECT g.group#, g.thread#, g.status, f.member

FROM v$log g JOIN v$logfile f ON g.group# = f.group#

ORDER BY g.group#, f.member;

 

Check ASM disks / diskgroups:

# as grid

asmcmd lsdg

asmcmd lsdsk -k    # show disks and their names

asmcmd lsdsk -G REDO_A

asmcmd lsdsk -G REDO_B

 

Check oracleasm device mappings (if needed):

/usr/sbin/oracleasm listdisks

/usr/sbin/oracleasm querydisk -p DISK_1A

 

Cautions & practical advice:

  • If underlying physical controllers still map multiple 1a/3a disks to the same controller, you'll not gain the intended separation — verify physical controller/enclosure mapping. Use ASM failure groups if you want to express enclosure/controller boundaries in ASM: create diskgroups with FAILGROUP definitions and assign disks accordingly.
  • Test failover and archiving under load: do a log switch and observe ARCH reading and LGWR writing—use iostat / vmstat and asmcmd iostat to verify no hot spots.
  • Keep at least two members per group (multiplexing) so a single disk failure won't block recovery. Consider using NORMAL or HIGH redundancy for extra protection (but that changes how many physical disks required).

 

                                                       Physical Layout (with FailGroup)

Disk

Device

Controller / Enclosure

Purpose

1a

/dev/sdb1

CONTROLLER_A

Redo member (Group 1 / Group 2 A-side)

2a

/dev/sdd1

CONTROLLER_A

Redo member (Group 2 A-side)

3a

/dev/sdf1

CONTROLLER_B

Redo member (Group 1 / Group 2 A-side)

4a

/dev/sdh1

CONTROLLER_B

Redo member (Group 2 A-side)

1b

/dev/sdc1

CONTROLLER_C

Redo member (Group 3 / Group 4 B-side)

2b

/dev/sde1

CONTROLLER_C

Redo member (Group 3 / Group 4 B-side)

3b

/dev/sdg1

CONTROLLER_D

Redo member (Group 3 / Group 4 B-side)

4b

/dev/sdi1

CONTROLLER_D

Redo member (Group 4 B-side)

 

ASM Diskgroup Creation Strategy

We’ll create two ASM diskgroups for redo logs:

+REDO_A — for A-side disks (1a–4a), mirrored between CONTROLLER_A and CONTROLLER_B

+REDO_B — for B-side disks (1b–4b), mirrored between CONTROLLER_C and CONTROLLER_D

 

Each diskgroup uses NORMAL REDUNDANCY, so ASM keeps one mirrored copy on a different failgroup.

 

SQL Commands (run as SYSASM)

-- Diskgroup for A-side redo

 

CREATE DISKGROUP REDO_A NORMAL REDUNDANCY

  FAILGROUP FG_A1 DISK

 'ORCL:DISK_1A',     'ORCL:DISK_2A' 

FAILGROUP FG_A2 DISK  

 'ORCL:DISK_3A',    'ORCL:DISK_4A'

  ATTRIBUTE

    'au_size'='4M',

    'compatible.asm'='19.0.0.0.0',

    'compatible.rdbms'='19.0.0.0.0',

    'sector_size'='512',

    'disk_repair_time'='3.6h';

 

 -- Diskgroup for B-side redo

CREATE DISKGROUP REDO_B NORMAL REDUNDANCY

  FAILGROUP FG_B1 DISK

    'ORCL:DISK_1B',

    'ORCL:DISK_2B'

  FAILGROUP FG_B2 DISK

    'ORCL:DISK_3B',

    'ORCL:DISK_4B'

  ATTRIBUTE

    'au_size'='4M',

    'compatible.asm'='19.0.0.0.0',

    'compatible.rdbms'='19.0.0.0.0',

    'sector_size'='512',

    'disk_repair_time'='3.6h';

 

 Notes:

  • NORMAL REDUNDANCY mirrors each extent between FG_A1 and FG_A2 (and similarly FG_B1 and FG_B2).
  • This ensures no mirror copies reside on the same physical controller.
  • au_size=4M is optimal for redo diskgroups because redo I/O is mostly sequential and benefits from larger allocation units.
  • disk_repair_time allows temporary offline disks to auto-recover without forced rebalance.

Example Verification (after creation)

asmcmd lsdg

asmcmd lsdsk -G REDO_A -k

asmcmd lsdsk -G REDO_B -k

 

Expected output snippet:

Group

Failgroup

Disk

Path

State

REDO_A

FG_A1

DISK_1A

ORCL:DISK_1A

NORMAL

REDO_A

FG_A1

DISK_2A

ORCL:DISK_2A

NORMAL

REDO_A

FG_A2

DISK_3A

ORCL:DISK_3A

NORMAL

REDO_A

FG_A2

DISK_4A

ORCL:DISK_4A

NORMAL

REDO_B

FG_B1

DISK_1B

ORCL:DISK_1B

NORMAL

REDO_B

FG_B1

DISK_2B

ORCL:DISK_2B

NORMAL

REDO_B

FG_B2

DISK_3B

ORCL:DISK_3B

NORMAL

REDO_B

FG_B2

DISK_4B

ORCL:DISK_4B

NORMAL

 

Then Create Redo Log Groups in DB

-- as SYSDBA (not SYSASM)

ALTER DATABASE ADD LOGFILE GROUP 1

  ('+REDO_A/redo_g1_1a.log', '+REDO_A/redo_g1_3a.log') SIZE 512M;

ALTER DATABASE ADD LOGFILE GROUP 2

  ('+REDO_A/redo_g2_2a.log', '+REDO_A/redo_g2_4a.log') SIZE 512M;

ALTER DATABASE ADD LOGFILE GROUP 3

  ('+REDO_B/redo_g3_1b.log', '+REDO_B/redo_g3_3b.log') SIZE 512M;

ALTER DATABASE ADD LOGFILE GROUP 4

  ('+REDO_B/redo_g4_2b.log', '+REDO_B/redo_g4_4b.log') SIZE 512M;

 

Performance & Design Best Practices

Topic

Recommendation

I/O Distribution

Always verify that each failgroup maps to a separate controller path.

Redo-only Diskgroup

Keep redo logs in dedicated ASM diskgroups to avoid random I/O interference.

AU Size

4 MB or 8 MB AU sizes are optimal for redo workloads.

ASM Rebalance Power

Set ASM_POWER_LIMIT to 8–12 during maintenance to avoid I/O stalls.

Arch Dest Separation

Store archives in a distinct diskgroup +ARCH_DG or filesystem to prevent LGWR-ARCH contention.

 

 

Creating Efficient RAC Data Objects: Reducing Cache Fusion Overhead and Improving Scalability

Creating Efficient RAC Data Objects: Reducing Cache Fusion Overhead and Improving Scalability Oracle Real Application Clusters (RAC) provi...