Database-Box: July 2025

Wednesday, July 30, 2025

Assessing and Optimizing Network Performance in the Oracle Data Guard Environment

Written by :Alireza Kamrani

Oracle Data Guard relies on the underlying network to send redo from the primary database to standby databases. Ensuring that the network is healthy and capable of supporting peak redo generation rates helps avoid future transport lags.

A transport lag forms when the primary database cannot ship redo to the standby faster than primary instance's redo generation rate. A transport lag can lead to potential data loss if a primary database failure occurs.

Network assessment consists of evaluating

Network reliability
Network bandwidth to accommodate peak redo generation rates

Note:

Each instance of the primary database instance generates its own redo and ships redo to the standby database in a single network stream. Therefore, maximizing single process network throughput for each node is critical for redo transport.

Historically there are areas that can reduce network and redo transport throughput resulting in potential transport lags:

Network firewalls or network encryption
Network firewalls and network (not Oracle Net) encryption can reduce overall throughput significantly. Verify throughput with the oratcp tool (described below), with and without encryption, and tune accordingly.
At times reducing the encryption level can increase throughput significantly. A balance is required to meet security needs with your performance and data loss requirements.
Redo transport compression
When database initialization parameter has LOG_ARCHIVE_DEST_N attribute COMPRESSION=ENABLE, Oracle background processes have to compress the redo before sending network message, and uncompress the redo before processing the redo. This reduces the overall redo and network throughput. Compression is only recommended if network bandwidth is insufficient between the primary and standby destinations.
Oracle Net encryption
Depending on the Oracle Net encryption level, this will have varying redo throughput impact, because Oracle Net messages containing redo have to be encrypted before sending and then unencrypted before redo processing.
Note that if database encryption is already enabled with Transparent Data Encryption (TDE), redo is already encrypted, although Oracle Net encryption can also encrypt the message headers.
Untuned network for redo transport

Increasing maximum operating system socket buffer size can increase single process throughput by 2-8 times. Test with different socket buffer sizes to see what value yields positive results, and ensure throughput is greater than the peak redo throughput.
Compare performance with various MTU settings.
If average redo write size is less than 1500 bytes, then try various MTU settings including MTU=9000 (For example, Jumbo Frames) for network interface that sends or receives redo on your system. This may reduce some unnecessary network round trips which will increase overall throughput.
Also note that for SYNC transport, Oracle's average redo write size (for example, Oracle message send) increases significantly as determined by v$sysstats or AWR statistics "redo size / redo writes".
When sending redo across geographical regions, experiments have shown that using MTU=9000 can also benefit in some network topologies. Conduct performance tests with oratcp and compare the results with default MTU and MTU=9000 settings.

Gather Topology Information

Understanding the topology of the Oracle Data Guard configuration, and its relevance to Data Guard performance, helps eliminate infrastructure weaknesses that are often incorrectly attributed to the Data Guard architecture.

Oracle recommends that you outline the following high-level architecture information.

Describe the primary and standby database system (number of nodes in Oracle RAC cluster, CPUs and memory per database node, storage I/O system)
Describe network topology connecting the primary and standby systems

Network switches and firewalls in between primary and standby
Network bandwidth and latency

For standby databases with symmetric hardware and configuration, and with a well tuned network configuration that can support peak redo generation rates, the transport lag should be less than 1 second.

Understanding Network Usage of Data Guard

The phases of the Data Guard life cycle which use the network most heavily are:

Instantiation - During this phase of standby database creation, files can be copied using parallelism from any host. Determining the degree of parallelism which maximizes throughput between nodes helps to optimize the standby instantiation process.
Redo Transport (Steady State)- Under normal Data Guard operation the primary database ships redo to the standby which is then applied. Each RAC instance of a primary database ships redo in a single stream from the host on which it is running. Understanding the requirements of each primary database instance and ensuring a single process can achieve the throughput requirements is critical to a standby database staying current with the primary database.

Understanding Targets and Goals for Instantiation

Instantiation for large databases can take hours, or in extreme cases days. To allow for planning of an instantiation and also maximize throughput between the primary and standby system to complete the instantiation is as timely a manner as possible, first determine the goal for instantiation time. Then follow the process defined below to maximize per process throughput and identify the optimal degree of parallelism between the primary and standby nodes.

Understanding Throughput Requirements and Average Redo Write Size for Redo Transport

Required network bandwidth of a given Data Guard configuration is determined by the redo generate rate of the primary database.

Note:

In cases where the primary database is pre-existing, a baseline for the required network bandwidth can be established. If there is no existing primary database, skip this step and future references to the data further in the process.

While the Automatic Workload Repository (AWR) tool can be used to determine the redo generation rate, the snapshots are often 30 or 60 minutes apart which can dilute the peak rate. Since peak rates often occur for shorter periods of time, it is more accurate to use the following query which calculates the redo generation rate for each log when run on an existing database. (change the timestamps as appropriate)

SQL> SELECT THREAD#, SEQUENCE#, BLOCKS*BLOCK_SIZE/1024/1024 MB,

(NEXT_TIME-FIRST_TIME)*86400 SEC,

(BLOCKS*BLOCK_SIZE/1024/1024)/((NEXT_TIME-FIRST_TIME)*86400) "MB/S"

FROM V$ARCHIVED_LOG

WHERE ((NEXT_TIME-FIRST_TIME)*86400<>0)

AND FIRST_TIME BETWEEN TO_DATE('2022/01/15 08:00:00','YYYY/MM/DD HH24:MI:SS')

AND TO_DATE('2022/01/15 11:00:00','YYYY/MM/DD HH24:MI:SS')

AND DEST_ID=1 ORDER BY FIRST_TIME;

Example output:

THREAD# SEQUENCE# MB SEC MB/s

------- --------- ---------- ---------- ----------

2 2291 29366.1963 831 35.338383

1 2565 29365.6553 781 37.6000708

2 2292 29359.3403 537 54.672887

1 2566 29407.8296 813 36.1719921

2 2293 29389.7012 678 43.3476418

2 2294 29325.2217 1236 23.7259075

1 2567 11407.3379 2658 4.29169973

2 2295 24682.4648 477 51.7452093

2 2296 29359.4458 954 30.7751004

2 2297 29311.3638 586 50.0193921

1 2568 3867.44092 5510 .701894903

Note

To find the peak redo rate, choose times during the highest level of processing, such as peak OLTP periods, End of Quarter batch processing or End of Year batch processing.

♦️In this short example the highest rate was about 52MB/s. Ideally the network will support the maximum rate plus 30% or 68MB/s for this application.

Verify Average Redo Write Size

Using v$sysstats or looking at your AWR reports for various workload and peak intervals, record the average redo write size based on

Average Redo Write Size = "REDO SIZE" / "REDO WRITES"

♦️Use this average redo write size in your oratcp experiments. If the average redo write size > 1500 bytes, experiment with various MTU settings.

Understand Current Network Throughput

The Oracle utility oratcptest is a general-purpose tool for measuring network bandwidth and latency similar to iperf/qperf which can be run by any OS user.

The oratcptest utility provides options for controlling the network load such as:

Network message size
Delay time between messages
Parallel streams
Whether or not the oratcptest server should write messages on disk.
Simulating Data Guard SYNC transport by waiting for acknowledgment (ACK) of a packet or ASYNC transport by not waiting for the ACK.

Note

This tool, like any Oracle network streaming transport, can simulate efficient network packet transfers from the source host to target host similar to Data Guard transport. Throughput can saturate the available network bandwidth between source and target servers.

♦️Therefore, Oracle recommends that short duration tests are performed and that consideration is given for any other critical applications sharing the same network.

Measure the Existing Throughput of One and Many Processes

Do the following tasks to measure the existing throughput.

Task 1: Install oratcptest

Download the oratcptest.jar file from MOS note 2064368.1
Copy the JAR file onto both client (primary) and server (standby)

Note:
oratcptest requires JRE 6 or later
Verify that the host has JRE 6 or later
On all primary and standby hosts, verify that the JVM can run the JAR file by displaying the help
# java -jar oratcptest.jar -help

Task 2: Determine the Existing Throughput for a Single Process

Data Guard asynchronous redo transport (ASYNC) uses a streaming protocol which does not wait for packet acknowledgment and therefore achieves higher rates than SYNC transport.

Start the test server on the receiving (standby) side.

java -jar oratcptest.jar -server [IP of standby host or VIP in RAC
configurations] -port=<any available port number>
Run the test client. (Change the server address and port number to match that of your server started in step 4.)
$ java -jar oratcptest.jar [IP of standby host or VIP in RAC configurations] -port=<port number> -mode=async -duration=120 -interval=20s [Requesting a test] Message payload = 1 Mbyte Payload content type= RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = (system default) Transport mode = ASYNC Disk write = NO Statistics interval = 20 seconds Test duration= 2 minutes Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (17:54:44) The server is ready. Throughput (17:55:04) 20.919 Mbytes/s (17:55:24) 12.883 Mbytes/s (17:55:44) 10.457 Mbytes/s (17:56:04) 10.408 Mbytes/s (17:56:24) 12.423 Mbytes/s (17:56:44) 13.701 Mbytes/s (17:56:44) Test finished. Socket send buffer = 2 Mbytes Avg. throughput = 13.466 Mbytes/s

♦️In this example the average throughput between these two nodes was about 13 MB/s which does not meet the requirements of 68 MB/s from the query.

Note

This process can be scheduled to run at a given frequency using the -freq option to determine if the bandwidth varies at different times of the day. For instance setting -freq=1h/24h will repeat the test every hour for 24 hours.

Task 3: Determine Existing Throughput for Multiple Processes

Repeat the previous test with two (2) connections (using num_conn parameter).

$ java -jar oratcptest.jar <target IP address> -port=<port number> -duration=60s -interval=10s -mode=async [-output=<results file>] -num_conn=2 [Requesting a test] Message payload = 1 Mbyte Payload content type = RANDOM Delay between messages = NO Number of connections = 2 Socket send buffer = (system default) Transport mode = ASYNC Disk write = NO Statistics interval = 20 seconds Test duration = 2 minutes Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (18:08:02) The server is ready Throughput (18:08:22) 44.894 Mbytes/s (18:08:42) 23.949 Mbytes/s (18:09:02) 25.206 Mbytes/s (18:09:22) 23.051 Mbytes/s (18:09:42) 24.978 Mbytes/s (18:10:02) 22.647 Mbytes/s (18:10:02) Test finished. Avg. socket send buffer = 2097152 Avg. aggregate throughput = 27.454 Mbytes/s
Re-run step 1 Iteratively and increase the value of num_conn by two each time until the aggregate throughput does not increase for three consecutive values. For example if the aggregate throughput is approximately the same for 10, 12 and 14 connections, stop.
Note
RMAN can utilize all nodes in the cluster for instantiation. To find the total aggregate throughput, see My Oracle Support Creating a Physical Standby database using RMAN restore database from service (Doc ID 2283978.1)
Run the same test with all nodes in all clusters to find the current total aggregate throughput. Node 1 of primary to node 1 of standby, node 2 to node 2, etc. Sum the throughput found for all nodes.
Reverse the roles and repeat the tests.
Note the number of connections which achieved the best aggregate throughput.

Use the total size of the database and total aggregate throughput to estimate the amount of time it will take to complete the copy of the database.

A full instantiation also needs to apply the redo generated during the copy. Some additional percentage (0%-50%) should be added to this estimated time based on how active the database is.

If the estimated time meets the goal, no additional tuning is required for instantiation.

Sunday, July 20, 2025

Oracle Data Access Methods in Action

Oracle Data access methods in Action

Alireza Kamrani 20 Jul 2025

Data Accesses

To use the data stored in database, applications need to expose it or retrieve it. That is the task of data accesses provided by Oracle.

Based on data locality, they are differentiated as physical read and logical read. Logical read is further divided into consistent gets and current gets.

Physical Read

To access any data, firstly Oracle has to move it from disk (persistent mass storage) to memory (volatile main storage), that is, physical read (disk read, db file read, cold read).

Oracle provides 3 basic approaches of db fille read:

(a). db fille sequential read

(b). db file scattered read

(c). db file parallel read

In this Section, we will look into different access paths and investigate their executions with tools like:

(1). Sql Trace

(2). Dtrace

(3). Oracle View: v$filestat and v$iostat_file

Note: All tests are done in Oracle 12.1.0.2 on Solaris.

In the following test, we create a table and one index on it, each row occupies about 1 DB block (db block size = 8192).

Full test code is appended at the end of this section.

SQL> create table test_tab tablespace test_ts as

select level x, rpad('ABC', 3500, 'X') y, rpad('ABC', 3500, 'X') z from dual connect by level <= 1e4;

SQL> create index test_tab#i1 on test_tab(x) tablespace test_ts;

SQL> select round(bytes/1024/1024) mb, blocks from dba_segments where segment_name = 'TEST_TAB';

--80 10240

DB File Read Access Path

We will run 4 variants of access path tests, and measure their performance in Sql Trace (event 10046) and Dtrace (see appended Dtrace Script).

Test-1 Single Read

As the first test, we will select 333 adjacent rows by rowid. Here the Sql Trace output:

SQL > exec db_file_read_test('single', 1, 333);

-- adjacent rowid, single block read, 'db file sequential read'

SELECT /*+ single_read */ Y FROM TEST_TAB T WHERE ROWID = :B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 333 0.01 0.01 0 0 0 0

Fetch 333 0.01 0.01 641 333 0 333

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 667 0.02 0.02 641 333 0 333

Row Source Operation

---------------------------------------------------

TABLE ACCESS BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=160 us cost=1 size=3513 card=1)

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited ---------- ------------

db file scattered read 44 0.00 0.00

db file sequential read 289 0.00 0.00

To read 333 rows, we perform 44 scattered read and 289 sequential reads, in total, 333 reads.

However, 641 blocks are read into memory because of scattered read.

Dtrace output reveals more details about lower OS layer calls:

PROBEFUNC FD RETURN_SIZE COUNT

lseek 260 0 44

readv 260 65536 44

pread 260 8192 289

PROBEFUNC FD MAX_READ_Blocks

pread 260 1

readv 260 8

TOTAL_SIZE = 5251072 , TOTAL_READ_Blocks = 641 , TOTAL_READ_CNT = 333

readv 260

value ------------------------------------------ Distribution ---------------------------------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 43

32768 |@ 1

65536 | 0

pread 260

value ------------------------------------------ Distribution ------------------------------------------ count

2048 | 0

4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 226

8192 |@@@@@@@@ 60

16384 | 2

32768 | 0

65536 | 1

131072 | 0

44 scattered read are ful_lled by 44 readv from filedescriptor FD: 260 (which are proceeded by 44 lseek), each of which fetches 65536 bytes (8 DB blocks).

289 sequential read are done by 289 pread, each of which fetches 8192 bytes (1 DB block).

Totally we read 641 DB blocks in 333 read OS calls.

Now we look the Dtrace quantize (frequency distribution diagram) output, in which the values in all lines

are always increased by power-of-two in nanoseconds. Each line indicates the count of the number of

elements greater than or equal to the corresponding value, but less than the next larger row value. It is

similar to Oracle Wait Event Histogram (for instance, v$event histogram).

The whole elapsed time (multiplied by 1.5 to get average value) can be estimated as:

readv: (16384*43 + 32768*1)*1.5 = 1105920

pread: (4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5 = 2273280

total: 1105920 + 2273280 = 3379200

The total elapse time of 3 millisecond (3379200 ns) in Dtrace is much less than xplan 20 millisecond (0.02 second) since Dtrace only collects time of OS IO activities, the other 17 ms could be consumed in the

DB side. For example, in the above xplan, execute phase took 10 millisecond (0.01 second), whereas two Wait Events there: db file scattered read and db file sequential read having Total Waited equal to 0.00

(the minimum time unit in xplan is centisecond, which seems inherited from old Oracle hundredths of a second counting).

We can also compare elapsed time per block read for readv (8 blocks per read request), and pread (1 block per read request),

thereby evaluate the exact performance difference between single block read and multi block read.

The result shows that readv is 2 times faster than pread per block read.

readv: (16384*43 + 32768*1)*1.5/8/44 = 3142

pread: (4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5/289 = 7866

Test-2 Scattered Read

In the second test, we also select 333 rows by rowid. Instead of adjacent rows, we read one row after

skipping 10 rows (see appended Test Code).

Here the Sql Trace output:

SQL > exec db_file_read_test('scattered', 1, 333);

-- jumped rowid, scattered read, 'db file scattered read'

SELECT /*+ scattered_read */ Y FROM TEST_TAB T WHERE ROWID =: B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 333 0.00 0.00 0 0 0 0

Fetch 333 0.02 0.02 2664 333 0 333

------- ------ -------- ---------- ---------- ----------- ---------- ---------

total 667 0.02 0.02 2664 333 0 333

Row Source Operation

---------------------------------------------------

TABLE ACCESS BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=156 us cost=1 size=3513 card=1)

Elapsed times include waiting on following events:

Event waited Times Max. Wait Total Waited

--------------------------- Waited ---------- ------------

db file scattered read 333 0.00 0.00

Oracle chooses db file scattered read to fetch all 333 rows with 2664 disk reads.

But xplan looks identical as single read, so xplan alone is not able to reveal the difference.

But Dtrace output shows the di_erence:

------------------------------ dtrace ------------------------------

PROBEFUNC FD RETURN_SIZE COUNT

Lseek 260 0 91

readv 260 65536 333

PROBEFUNC FD MAX_READ_Blocks

Readv 260 8

TOTAL_SIZE = 21823488 , TOTAL_READ_Blocks = 2664 , TOTAL_READ_CNT = 333

readv 260

value ------------- Distribution ------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 290

32768 |@@@@@ 43

65536 | 0

Each readv request returns 8 DB blocks, 333 readv accumulated to exactly 333 x 8 = 2664.

In other hand, 91 lseek moving probably indicates that most of blocks are located next to each other.

Test-3 Parallel Read

In the next test, we read 333 rows by index range scan. Sql Trace shows the third type of db file read:

db file parallel read. In the output, we also include part of Raw Trace file.

SQL > exec db_file_read_test('parallel', 1, 333);

SELECT /*+ index(t test_tab#i1) parallel_read */ MAX(Y) FROM TEST_TAB T WHERE X BETWEEN 1 AND :B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 1 0.00 0.00 344 335 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 3 0.00 0.00 344 335 0 1

Row Source Operation

---------------------------------------------------

SORT AGGREGATE (cr=335 pr=344 pw=0 time=3760 us)

FILTER (cr=335 pr=344 pw=0 time=1698 us)

TABLE ACCESS BY INDEX ROWID BATCHED TEST_TAB (cr=335 pr=344 pw=0 time=1361 us cost=168 size=1167165 card=333)

INDEX RANGE SCAN TEST_TAB#I1 (cr=2 pr=8 pw=0 time=279 us cost=1 size=0 card=333)(object id 2260477)

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited------------ ---------- ------------

db file scattered read 4 0.00 0.00

db file parallel read 2 0.00 0.00

-- Raw Trace File --

'db file scattered read' ela= 49 file#=917 block#=10368 blocks=8 obj#=2260477 (Index TEST_TAB#I1)

'db file scattered read' ela= 27 file#=917 block#=128 blocks=8 obj#=2260476 (Table TEST_TAB)

'db file scattered read' ela= 21 file#=917 block#=136 blocks=8 obj#=2260476

'db file parallel read' ela= 422 files=1 blocks=127 requests=127 obj#=2260476

'db file parallel read' ela= 334 files=1 blocks=127 requests=127 obj#=2260476

'db file scattered read' ela= 264 file#=917 block#=409 blocks=66 obj#=2260476

Look Raw Trace file, first 3 lines are db file scattered read with blocks=8 (one of which is to read index

TEST TAB#I1), then 2 lines of db file parallel read with both blocks=127 and requests=127, last line is

one db file scattered read with blocks=66. In total, we made 3*8 + 2*127 + 66 = 344 disk reads in 258 read requests.

Dtrace Ouput shows more details of OS calls:

PROBEFUNC FD RETURN_SIZE COUNT

pread 260 540672 1

lseek 260 0 2

readv 260 65536 3

pread 260 8192 254

PROBEFUNC FD MAX_READ_Blocks

Readv 260 8

pread 260 66

TOTAL_SIZE = 2818048 , TOTAL_READ_Blocks = 344 , TOTAL_READ_CNT = 258

readv 260

value -------------------------------- Distribution ------------------------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2

32768 |@@@@@@@@@@@@@ 1

65536 | 0

pread 260

value -------------------------------- Distribution -------------------------------- count

2048 | 0

4096 |@@@@@@@@@@@@@@@@@@@@@@@@@ 196

8192 |@@@@@@@@ 52

16384 | 2

32768 | 0

65536 |@ 4

131072 | 1

262144 | 0

Crosschecking Sql Raw Trace with Dtrace, we can see:

3 blocks=8 db file scattered read are implemented by 3 readv with RETURN SIZE=65536 each.

2 blocks=127 db file parallel read are satisfied by 254 pread with RETURN SIZE=8192 each.

1 blocks=66 db file scattered read is done by 1 pread with RETURN SIZE=540672(=66*8192).

In total, we read 344 DB blocks by 258 (=3+254+1) OS read calls.

The last db file scattered read with blocks=66 also shows that one pread can read 66 blocks, much higher than db file multiblock read count=32 configured in this database.

Since 66 is not divisible by 32, it is probably an OS disk read optimization (disk read merging) for Oracle "Batched" reads, which is visible in xplan as "table access by index rowid batched".

Such kind of pread is triggered after low level OS optimization, that is probably why db file multiblock read count=32 has no effect there.

"Batched" reads is controlled by Oracle 12c hidden parameter optimizer batch table access by rowid (enable table access by ROWID IO batching), or 11g nlj batching enabled (enable batching of the RHS IO in NLJ).

For example, "Batched" can be disabled by:

SELECT /*+ index(t test_tab#i1) opt_param('_optimizer_batch_table_access_by_rowid', 'false') parallel_read */ MAX(Y)

FROM TEST_TAB T WHERE X BETWEEN 1 AND :B1;

In xplan, db file parallel read is indicated with Times Waited being 2, but real OS calls are 254 pread

requests. We will discuss it later on AIO read.

By the way, we have 3 readv, but only 2 lseek, so there are probably 2 readv share one lseek.

From above Sql Trace output and Dtrace output, we can see that pread can ful_ll both db file parallel

read and db files cattered read. Back to Test-1 Single Read, in which db file sequential read is also

performed by pread, we can say pread is universal for all 3 types of db file reads.

Test-4 Full Read

As the last test, we read 333 rows by a full table scan. Here the Sql Trace output including its raw trace

lines:

SQL > exec db_file_read_test('full', 1, 333);

SELECT /*+ full_read */ MAX(Y) FROM TEST_TAB T WHERE ROWNUM <= :B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 1 0.00 0.00 342 342 3 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 3 0.00 0.00 342 342 3 1

Row Source Operation

---------------------------------------------------

SORT AGGREGATE (cr=342 pr=342 pw=0 time=3788 us)

COUNT STOPKEY (cr=342 pr=342 pw=0 time=1371 us)

TABLE ACCESS FULL TEST_TAB (cr=342 pr=342 pw=0 time=925 us cost=99 size=1169334 card=334)

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited------------------ ---------- ------------

db file sequential read 2 0.00 0.00

db file scattered read 23 0.00 0.00

-- Raw Trace File --

'db file sequential read' ela= 19 file#=917 block#=130 blocks=1 obj#=2260476 tim=647299975335

'db file sequential read' ela= 14 file#=3 block#=768 blocks=1 obj#=0 tim=647299975399

-- UNDO file#=3 /oratestdb/oradata/testdb/undo01.dbf

'db file scattered read' ela= 22 file#=917 block#=131 blocks=5 obj#=2260476 tim=647299975501

'db file scattered read' ela= 25 file#=917 block#=136 blocks=8 obj#=2260476 tim=647299975609

'db file scattered read' ela= 25 file#=917 block#=145 blocks=7 obj#=2260476 tim=647299975713

'db file scattered read' ela= 23 file#=917 block#=152 blocks=8 obj#=2260476 tim=647299975806

'db file scattered read' ela= 25 file#=917 block#=161 blocks=7 obj#=2260476 tim=647299975901

'db file scattered read' ela= 24 file#=917 block#=168 blocks=8 obj#=2260476 tim=647299975994

'db file scattered read' ela= 23 file#=917 block#=177 blocks=7 obj#=2260476 tim=647299976088

'db file scattered read' ela= 23 file#=917 block#=184 blocks=8 obj#=2260476 tim=647299976178

'db file scattered read' ela= 23 file#=917 block#=193 blocks=7 obj#=2260476 tim=647299976270

'db file scattered read' ela= 22 file#=917 block#=200 blocks=8 obj#=2260476 tim=647299976364

'db file scattered read' ela= 23 file#=917 block#=209 blocks=7 obj#=2260476 tim=647299976465

'db file scattered read' ela= 22 file#=917 block#=216 blocks=8 obj#=2260476 tim=647299976554

'db file scattered read' ela= 22 file#=917 block#=225 blocks=7 obj#=2260476 tim=647299976646

'db file scattered read' ela= 29 file#=917 block#=232 blocks=8 obj#=2260476 tim=647299976759

'db file scattered read' ela= 24 file#=917 block#=241 blocks=7 obj#=2260476 tim=647299976866

'db file scattered read' ela= 23 file#=917 block#=248 blocks=8 obj#=2260476 tim=647299976956

'db file scattered read' ela= 128 file#=917 block#=258 blocks=32 obj#=2260476 tim=647299977200

'db file scattered read' ela= 95 file#=917 block#=290 blocks=32 obj#=2260476 tim=647299977511

'db file scattered read' ela= 97 file#=917 block#=322 blocks=32 obj#=2260476 tim=647299977822

'db file scattered read' ela= 87 file#=917 block#=354 blocks=30 obj#=2260476 tim=647299978113

'db file scattered read' ela= 96 file#=917 block#=386 blocks=32 obj#=2260476 tim=647299978407

'db file scattered read' ela= 108 file#=917 block#=418 blocks=32 obj#=2260476 tim=647299978719

'db file scattered read' ela= 94 file#=917 block#=450 blocks=32 obj#=2260476 tim=647299979021.

We made 2 db file sequential read with blocks=1, and 23 db file scattered read with blocks varied from 5 to 32, but summing up them together,

it is 342 disk reads. In this case, the maximum blocks=32 is probably dictated by db file multiblock read count.

Note that the second db file sequential read is to read undo data block (file#=3 block#=768 obj#=0).

It is visible in Raw Trace, and counted in xplan statistics.

Look again Dtrace output:

PROBEFUNC FD RETURN_SIZE COUNT

Pread 260 8192 1

pread 260 245760 1

readv 260 40960 1

pread 260 262144 6

readv 260 57344 7

lseek 260 0 8

readv 260 65536 8

PROBEFUNC FD MAX_READ_Blocks

readv 260 8

pread 260 32

TOTAL_SIZE = 2793472 , TOTAL_READ_Blocks = 341 , TOTAL_READ_CNT = 24

readv 260

value ----------------------------- Distribution -------------------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@ 16

32768 | 0

pread 260

value ----------------------------- Distribution ----------------------------- count

16384 | 0

32768 |@@@@@@@@@@@@@@@@@@@@ 4

65536 |@@@@@@@@@@@@@@@ 3

131072 |@@@@@ 1

262144 | 0

Crosscheck Sql Raw Trace with Dtrace, we can see:

1 blocks=1 db file sequential read is implemented by 1 pread with RETURN SIZE=8192.

1 blocks=5 db file scattered read is implemented by 1 readv with RETURN SIZE=40960.

7 blocks=7 db file scattered read is implemented by 7 readv with RETURN SIZE=57344.

8 blocks=8 db file scattered read is implemented by 8 readv with RETURN SIZE=65536.

1 blocks=30 db file scattered read is implemented by 1 pread with RETURN SIZE=245760.

6 blocks=32 db file scattered read is implemented by 6 pread with RETURN SIZE=262144.

In total, we read 341 DB blocks by 24 OS read calls.

Sql Trace showed 25 (=23+2) reads to get 342 blocks, Dtrace showed 24 reads to get 341 blocks, one extra

read in Sql Trace is due to one undo read (file#=3). (Note: in Dtrace script, we only trace FD=260,

that is file#=917, undo file#=3 is not traced)

If we display segment extent allocations by two queries below:

SQL > select segment_type, segment_subtype, header_block, blocks, extents, initial_extent, next_extent from dba_segments v where segment_name = 'TEST_TAB';

SEGMENT_TYPE SEGMENT_SU HEADER_BLOCK BLOCKS EXTENTS INITIAL_EXTENT NEXT_EXTENT

------------- ---------- ------------ ------ ------- -------------- -----------

TABLE ASSM 130 10240 81 65536 1048576

SQL > select blocks, count(*) cnt, min(extent_id), min(block_id) from dba_extents where segment_name = 'TEST_TAB' group by blocks order by min(extent_id);

BLOCKS CNT MIN(EXTENT_ID) MIN(BLOCK_ID)

------ --- -------------- -------------

8 16 0 128

128 63 16 256

1024 2 79 8320

and then crosscheck with above raw trace file again. We can see that the first db file sequential read (block#=130) by pread is to read segment header block (HEADER BLOCK: 130), the next 16 db file scattered

read with blocks between 5 and 8 by readv is to read all 16 initial extents (8 blocks per extent), the rest 7 db file scattered read with blocks between 30 and 32 by pread is to read incremental extents (128 blocks per extent).

The size of incremental extents is 128 block, but each scattered read can read maximum 32 blocks (db file multiblock read count=32).

Alireza Kamrani

Database-Box

Wednesday, July 30, 2025

Assessing and Optimizing Network Performance in the Oracle Data Guard Environment

Sunday, July 20, 2025

Oracle Data Access Methods in Action

Oracle Hot Block Concept, Detection and Resolving scenarios

Report Abuse

Labels