Wednesday, July 30, 2025

Assessing and Optimizing Network Performance in the Oracle Data Guard Environment

Assessing and Optimizing Network Performance in the Oracle Data Guard Environment 


           Written by :Alireza Kamrani

              

Oracle Data Guard relies on the underlying network to send redo from the primary database to standby databases. Ensuring that the network is healthy and capable of supporting peak redo generation rates helps avoid future transport lags.

A transport lag forms when the primary database cannot ship redo to the standby faster than primary instance's redo generation rate. A transport lag can lead to potential data loss if a primary database failure occurs.

Network assessment consists of evaluating

  • Network reliability
  • Network bandwidth to accommodate peak redo generation rates

Note:

Each instance of the primary database instance generates its own redo and ships redo to the standby database in a single network stream. Therefore, maximizing single process network throughput for each node is critical for redo transport.

Historically there are areas that can reduce network and redo transport throughput resulting in potential transport lags:

  1. Network firewalls or network encryption
    Network firewalls and network (not Oracle Net) encryption can reduce overall throughput significantly. Verify throughput with the 
    oratcp tool (described below), with and without encryption, and tune accordingly.
    At times reducing the encryption level can increase throughput significantly. A balance is required to meet security needs with your performance and data loss requirements.
  2. Redo transport compression
    When database initialization parameter has LOG_ARCHIVE_DEST_N attribute COMPRESSION=ENABLE, Oracle background processes have to compress the redo before sending network message, and uncompress the redo before processing the redo. This reduces the overall redo and network throughput. 
    Compression is only recommended if network bandwidth is insufficient between the primary and standby destinations.
  3. Oracle Net encryption
    Depending on the Oracle Net encryption level, this will have varying redo throughput impact, because Oracle Net messages containing redo have to be encrypted before sending and then unencrypted before redo processing.
    Note that if database encryption is already enabled with Transparent Data Encryption (TDE), redo is already encrypted, although Oracle Net encryption can also encrypt the message headers.
  4. Untuned network for redo transport
    • Increasing maximum operating system socket buffer size can increase single process throughput by 2-8 times. Test with different socket buffer sizes to see what value yields positive results, and ensure throughput is greater than the peak redo throughput.
    • Compare performance with various MTU settings.
      If average redo write size is less than 1500 bytes, then try various MTU settings including MTU=9000 (For example, Jumbo Frames) for network interface that sends or receives redo on your system. This may reduce some unnecessary network round trips which will increase overall throughput. 
      Also note that for SYNC transport, Oracle's average redo write size (for example, Oracle message send) increases significantly as determined by v$sysstats or AWR statistics "redo size / redo writes".
      When sending redo across geographical regions, experiments have shown that using MTU=9000 can also benefit in some network topologies. Conduct performance tests with oratcp and compare the results with default MTU and MTU=9000 settings.

Gather Topology Information

Understanding the topology of the Oracle Data Guard configuration, and its relevance to Data Guard performance, helps eliminate infrastructure weaknesses that are often incorrectly attributed to the Data Guard architecture.

Oracle recommends that you outline the following high-level architecture information.

  • Describe the primary and standby database system (number of nodes in Oracle RAC cluster, CPUs and memory per database node, storage I/O system)
  • Describe network topology connecting the primary and standby systems
    • Network switches and firewalls in between primary and standby
    • Network bandwidth and latency

For standby databases with symmetric hardware and configuration, and with a well tuned network configuration that can support peak redo generation rates, the transport lag should be less than 1 second. 


Understanding Network Usage of Data Guard

The phases of the Data Guard life cycle which use the network most heavily are:

  • Instantiation - During this phase of standby database creation, files can be copied using parallelism from any host. Determining the degree of parallelism which maximizes throughput between nodes helps to optimize the standby instantiation process.
  • Redo Transport (Steady State)- Under normal Data Guard operation the primary database ships redo to the standby which is then applied. Each RAC instance of a primary database ships redo in a single stream from the host on which it is running. Understanding the requirements of each primary database instance and ensuring a single process can achieve the throughput requirements is critical to a standby database staying current with the primary database.

Understanding Targets and Goals for Instantiation

Instantiation for large databases can take hours, or in extreme cases days. To allow for planning of an instantiation and also maximize throughput between the primary and standby system to complete the instantiation is as timely a manner as possible, first determine the goal for instantiation time. Then follow the process defined below to maximize per process throughput and identify the optimal degree of parallelism between the primary and standby nodes.


Understanding Throughput Requirements and Average Redo Write Size for Redo Transport

Required network bandwidth of a given Data Guard configuration is determined by the redo generate rate of the primary database. 

Note:

In cases where the primary database is pre-existing, a baseline for the required network bandwidth can be established. If there is no existing primary database, skip this step and future references to the data further in the process.


While the Automatic Workload Repository (AWR) tool can be used to determine the redo generation rate, the snapshots are often 30 or 60 minutes apart which can dilute the peak rate. Since peak rates often occur for shorter periods of time, it is more accurate to use the following query which calculates the redo generation rate for each log when run on an existing database. (change the timestamps as appropriate)


SQL> SELECT THREAD#, SEQUENCE#, BLOCKS*BLOCK_SIZE/1024/1024 MB,

(NEXT_TIME-FIRST_TIME)*86400 SEC,

 (BLOCKS*BLOCK_SIZE/1024/1024)/((NEXT_TIME-FIRST_TIME)*86400) "MB/S"

FROM V$ARCHIVED_LOG

WHERE ((NEXT_TIME-FIRST_TIME)*86400<>0)

AND FIRST_TIME BETWEEN TO_DATE('2022/01/15 08:00:00','YYYY/MM/DD HH24:MI:SS')

 AND TO_DATE('2022/01/15 11:00:00','YYYY/MM/DD HH24:MI:SS')

AND DEST_ID=1 ORDER BY FIRST_TIME;


Example output:

THREAD#  SEQUENCE#  MB         SEC        MB/s

------- --------- ---------- ---------- ----------

      2      2291  29366.1963        831  35.338383

      1      2565  29365.6553        781 37.6000708

      2      2292  29359.3403        537  54.672887

      1      2566  29407.8296        813 36.1719921

      2      2293  29389.7012        678 43.3476418

      2      2294  29325.2217       1236 23.7259075

      1      2567  11407.3379         2658 4.29169973

      2      2295  24682.4648        477 51.7452093

      2      2296  29359.4458        954 30.7751004

      2      2297  29311.3638        586 50.0193921

      1      2568  3867.44092       5510 .701894903

Note

To find the peak redo rate, choose times during the highest level of processing, such as peak OLTP periods, End of Quarter batch processing or End of Year batch processing.


♦️In this short example the highest rate was about 52MB/s. Ideally the network will support the maximum rate plus 30% or 68MB/s for this application.


Verify Average Redo Write Size

Using v$sysstats or looking at your AWR reports for various workload and peak intervals, record the average redo write size based on

Average Redo Write Size = "REDO SIZE" / "REDO WRITES"

♦️Use this average redo write size in your oratcp experiments. If the average redo write size > 1500 bytes, experiment with various MTU settings.

Understand Current Network Throughput

The Oracle utility oratcptest is a general-purpose tool for measuring network bandwidth and latency similar to iperf/qperf which can be run by any OS user.


The oratcptest utility provides options for controlling the network load such as:

  • Network message size
  • Delay time between messages
  • Parallel streams
  • Whether or not the oratcptest server should write messages on disk.
  • Simulating Data Guard SYNC transport by waiting for acknowledgment (ACK) of a packet or ASYNC transport by not waiting for the ACK.

Note

This tool, like any Oracle network streaming transport, can simulate efficient network packet transfers from the source host to target host similar to Data Guard transport. Throughput can saturate the available network bandwidth between source and target servers. 

♦️Therefore, Oracle recommends that short duration tests are performed and that consideration is given for any other critical applications sharing the same network.


Measure the Existing Throughput of One and Many Processes

Do the following tasks to measure the existing throughput.

Task 1: Install oratcptest

  1. Download the oratcptest.jar file from MOS note 2064368.1
  2. Copy the JAR file onto both client (primary) and server (standby)

    Note:
    oratcptest requires JRE 6 or later
  3. Verify that the host has JRE 6 or later
  4. On all primary and standby hosts, verify that the JVM can run the JAR file by displaying the help
    # java -jar oratcptest.jar -help

Task 2: Determine the Existing Throughput for a Single Process

Data Guard asynchronous redo transport (ASYNC) uses a streaming protocol which does not wait for packet acknowledgment and therefore achieves higher rates than SYNC transport. 

  1. Start the test server on the receiving (standby) side.

    java -jar oratcptest.jar -server [IP of standby host or VIP in RAC
  2. configurations] -port=<any available port number>
  3. Run the test client. (Change the server address and port number to match that of your server started in step 4.)
  4. java -jar oratcptest.jar [IP of standby host or VIP in RAC configurations] -port=<port number> -mode=async -duration=120 -interval=20s       [Requesting a test] Message payload  = 1 Mbyte Payload content type= RANDOM Delay between messages = NO Number of connections  = 1 Socket send buffer = (system default) Transport mode = ASYNC Disk write = NO Statistics interval    = 20 seconds Test duration= 2 minutes Test frequency  = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (17:54:44) The server is ready. Throughput (17:55:04)     20.919 Mbytes/s (17:55:24)     12.883 Mbytes/s (17:55:44)  10.457 Mbytes/s (17:56:04)  10.408 Mbytes/s (17:56:24)     12.423 Mbytes/s (17:56:44)  13.701 Mbytes/s (17:56:44) Test finished. Socket send buffer = 2 Mbytes Avg. throughput = 13.466 Mbytes/s

♦️In this example the average throughput between these two nodes was about 13 MB/s which does not meet the requirements of 68 MB/s from the query.

Note

This process can be scheduled to run at a given frequency using the -freq option to determine if the bandwidth varies at different times of the day. For instance setting -freq=1h/24h will repeat the test every hour for 24 hours.


Task 3: Determine Existing Throughput for Multiple Processes

  1. Repeat the previous test with two (2) connections (using num_conn parameter). 

    java -jar oratcptest.jar <target IP address> -port=<port number> -duration=60s -interval=10s -mode=async                                                   [-output=<results file>] -num_conn=2 [Requesting a test] Message payload        = 1 Mbyte Payload content type   = RANDOM Delay between messages = NO Number of connections  = 2 Socket send buffer = (system default) Transport mode  = ASYNC Disk write = NO Statistics interval  = 20 seconds Test duration  = 2 minutes Test frequency  = NO Network Timeout  = NO (1 Mbyte = 1024x1024 bytes) (18:08:02) The server is ready Throughput (18:08:22)     44.894 Mbytes/s (18:08:42)     23.949 Mbytes/s (18:09:02)     25.206 Mbytes/s (18:09:22)     23.051 Mbytes/s (18:09:42)     24.978 Mbytes/s (18:10:02)     22.647 Mbytes/s (18:10:02) Test finished. Avg. socket send buffer = 2097152 Avg. aggregate throughput = 27.454 Mbytes/s
  2. Re-run step 1 Iteratively and increase the value of num_conn by two each time until the aggregate throughput does not increase for three consecutive values. For example if the aggregate throughput is approximately the same for 10, 12 and 14 connections, stop.
    Note
    RMAN can utilize all nodes in the cluster for instantiation. To find the total aggregate throughput, see My Oracle Support Creating a Physical Standby database using RMAN restore database from service (Doc ID 2283978.1)
  3. Run the same test with all nodes in all clusters to find the current total aggregate throughput. Node 1 of primary to node 1 of standby, node 2 to node 2, etc. Sum the throughput found for all nodes.
  4. Reverse the roles and repeat the tests.
  5. Note the number of connections which achieved the best aggregate throughput.

Use the total size of the database and total aggregate throughput to estimate the amount of time it will take to complete the copy of the database. 

A full instantiation also needs to apply the redo generated during the copy. Some additional percentage (0%-50%) should be added to this estimated time based on how active the database is.

If the estimated time meets the goal, no additional tuning is required for instantiation.


Sunday, July 20, 2025

Oracle Data Access Methods in Action

                                                         

Oracle Data access methods in Action

Alireza Kamrani     20 Jul 2025


Data Accesses

To use the data stored in database, applications need to expose it or retrieve it. That is the task of data accesses provided by Oracle.

Based on data locality, they are differentiated as physical read and logical read. Logical read is further divided into consistent gets and current gets.

 

Physical Read

To access any data, firstly Oracle has to move it from disk (persistent mass storage) to memory (volatile main storage), that is, physical read (disk read, db file read, cold read).

Oracle provides 3 basic approaches of db fille read:

(a). db fille sequential read

(b). db file scattered read

(c). db file parallel read

In this Section, we will look into different access paths and investigate their executions with tools like:

(1). Sql Trace

(2). Dtrace

(3). Oracle View: v$filestat and v$iostat_file

Note: All tests are done in Oracle 12.1.0.2 on Solaris.

 

In the following test, we create a table and one index on it, each row occupies about 1 DB block (db block size = 8192).

Full test code is appended at the end of this section.

 

SQL> create table test_tab tablespace test_ts as

select level x, rpad('ABC', 3500, 'X') y, rpad('ABC', 3500, 'X') z from dual connect by level <= 1e4;

SQL> create index test_tab#i1 on test_tab(x) tablespace test_ts;

SQL> select round(bytes/1024/1024) mb, blocks from dba_segments where segment_name = 'TEST_TAB';

--80 10240

 

DB File Read Access Path

We will run 4 variants of access path tests, and measure their performance in Sql Trace (event 10046) and Dtrace (see appended Dtrace Script).

 

Test-1 Single Read

As the first test, we will select 333 adjacent rows by rowid. Here the Sql Trace output:

SQL > exec db_file_read_test('single', 1, 333);

-- adjacent rowid, single block read, 'db file sequential read'

SELECT /*+ single_read */ Y FROM TEST_TAB T WHERE ROWID = :B1

 

call                         count     cpu         elapsed                disk                       query                   current                 rows

-------                     ------      --------   ----------                ----------                ----------                ----------          ----------

Parse                     1             0.00       0.00                      0                            0                             0                             0

Execute                333        0.01       0.01                          0                          0                             0                             0

Fetch                     333        0.01       0.01                       641                        333                        0                             333

-------                     ------    --------   ----------            ----------                       ----------                ----------            ----------

total                       667        0.02     0.02                       641                        333                        0                            333

Row Source Operation

---------------------------------------------------

TABLE ACCESS BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=160 us cost=1 size=3513 card=1)

Event waited on                               Times        Max. Wait            Total Waited

---------------------------------------- Waited        ----------                       ------------

db file scattered read                           44                     0.00                       0.00

db file sequential read        289                    0.00                       0.00

To read 333 rows, we perform 44 scattered read and 289 sequential reads, in total, 333 reads.

However, 641 blocks are read into memory because of scattered read.

Dtrace output reveals more details about lower OS layer calls:

PROBEFUNC                       FD                           RETURN_SIZE                     COUNT

lseek                                      260                        0                                             44

readv                                     260                        65536                                   44

pread                                    260                        8192                                      289

PROBEFUNC                       FD                           MAX_READ_Blocks

pread                                    260                                        1

readv                                     260                                        8

TOTAL_SIZE = 5251072 , TOTAL_READ_Blocks = 641 , TOTAL_READ_CNT = 333

readv                     260

value    ------------------------------------------ Distribution ----------------------------------------                       count

8192 |                                                                                                                                                                                  0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     43

32768 |@                                                                                                                                                                            1

65536 | 0

pread    260

value ------------------------------------------ Distribution ------------------------------------------                             count

2048 |                                                                                                                                                                                   0

4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                                           226

8192 |@@@@@@@@                                                                                                                                                60

16384 |                                                                                                                                                                                2

32768 |                                                                                                                                                                                0

65536 |                                                                                                                                                                                1

131072 |                                                                                                                                                                           0

 

44 scattered read are ful_lled by 44 readv from filedescriptor FD: 260 (which are proceeded by 44 lseek), each of which fetches 65536 bytes (8 DB blocks).

289 sequential read are done by 289 pread, each of which fetches 8192 bytes (1 DB block).

Totally we read 641 DB blocks in 333 read OS calls.

Now we look the Dtrace quantize (frequency distribution diagram) output, in which the values in all lines

are always increased by power-of-two in nanoseconds. Each line indicates the count of the number of

elements greater than or equal to the corresponding value, but less than the next larger row value. It is

similar to Oracle Wait Event Histogram (for instance, v$event histogram).

The whole elapsed time (multiplied by 1.5 to get average value) can be estimated as:

readv: (16384*43 + 32768*1)*1.5 = 1105920

pread: (4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5 = 2273280

total: 1105920 + 2273280 = 3379200

 

The total elapse time of 3 millisecond (3379200 ns) in Dtrace is much less than xplan 20 millisecond (0.02 second) since Dtrace only collects time of OS IO activities, the other 17 ms could be consumed in the

DB side. For example, in the above xplan, execute phase took 10 millisecond (0.01 second), whereas two Wait Events there: db file scattered read and db file sequential read having Total Waited equal to 0.00

(the minimum time unit in xplan is centisecond, which seems inherited from old Oracle hundredths of a second counting).

We can also compare elapsed time per block read for readv (8 blocks per read request), and pread (1 block per read request),

 thereby evaluate the exact performance difference between single block read and multi block read.

 

The result shows that readv is 2 times faster than pread per block read.

readv: (16384*43 + 32768*1)*1.5/8/44 = 3142

pread: (4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5/289 = 7866

Test-2 Scattered Read

In the second test, we also select 333 rows by rowid. Instead of adjacent rows, we read one row after

skipping 10 rows (see appended Test Code).

Here the Sql Trace output:

SQL > exec db_file_read_test('scattered', 1, 333);

-- jumped rowid, scattered read, 'db file scattered read'

SELECT /*+ scattered_read */ Y FROM TEST_TAB T WHERE ROWID =: B1

call                         count    cpu        elapsed                 disk                         query   current rows

-------                     ------      --------   ----------                ----------                ----------                ----------                ----------

Parse                     1             0.00       0.00                      0                            0                             0                             0

Execute                333        0.00       0.00                       0                             0                             0                             0

Fetch                     333        0.02       0.02                       2664                      333                        0                             333

-------                     ------     --------  ----------               ----------                -----------               ---------- ---------

total                       667        0.02       0.02                       2664                      333                        0                             333

Row Source Operation

---------------------------------------------------

TABLE ACCESS BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=156 us cost=1 size=3513 card=1)

Elapsed times include waiting on following events:

 

Event waited                          Times                              Max. Wait                            Total Waited

---------------------------             Waited                           ----------                                ------------

db file scattered read              333                               0.00                                      0.00

 

Oracle chooses db file scattered read to fetch all 333 rows with 2664 disk reads.

 But xplan looks identical as single read, so xplan alone is not able to reveal the difference.

 

But Dtrace output shows the di_erence:

------------------------------ dtrace ------------------------------

PROBEFUNC                       FD                           RETURN_SIZE                                    COUNT

Lseek                                     260                                        0                                             91

readv                                     260                                        65536                                   333

PROBEFUNC                       FD                           MAX_READ_Blocks

Readv                                    260                                       8

TOTAL_SIZE = 21823488 , TOTAL_READ_Blocks = 2664 , TOTAL_READ_CNT = 333

readv                                     260

value ------------- Distribution -------------                                                                                                   count

8192 |                                                                                                                                                                                   0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       290

32768 |@@@@@                                                                                                                                                          43

65536 |                                                                                                                                                                                0

 

Each readv request returns 8 DB blocks, 333 readv accumulated to exactly 333 x 8 = 2664.

In other hand, 91 lseek moving probably indicates that most of blocks are located next to each other.

Test-3 Parallel Read

In the next test, we read 333 rows by index range scan. Sql Trace shows the third type of db file read:

db file parallel read. In the output, we also include part of Raw Trace file.

SQL > exec db_file_read_test('parallel', 1, 333);

SELECT /*+ index(t test_tab#i1) parallel_read */ MAX(Y) FROM TEST_TAB T WHERE X BETWEEN 1 AND :B1

call                         count                     cpu                         elapsed                 disk                        query                    current                 rows

-------                     ------                      --------                   ----------                ----------                ----------                ----------                                ----------

Parse                     1                             0.00                      0.00                      0                             0                             0                             0

Execute                1                             0.00                       0.00                       0                             0                             0                            0

Fetch                     1                             0.00                       0.00                       344                        335                        0                             1

-------                     ------                      --------                   ----------                ----------                ----------                ----------                                ----------

total                       3                             0.00                       0.00                       344                        335                        0                            1

 

Row Source Operation

---------------------------------------------------

SORT AGGREGATE (cr=335 pr=344 pw=0 time=3760 us)

FILTER (cr=335 pr=344 pw=0 time=1698 us)

TABLE ACCESS BY INDEX ROWID BATCHED TEST_TAB (cr=335 pr=344 pw=0 time=1361 us cost=168 size=1167165 card=333)

INDEX RANGE SCAN TEST_TAB#I1 (cr=2 pr=8 pw=0 time=279 us cost=1 size=0 card=333)(object id 2260477)

Event waited on                                    Times                                Max. Wait                            Total Waited

---------------------------------------- Waited------------  ----------                               ------------

db file scattered read                                      4                             0.00                                       0.00

db file parallel read                                          2                             0.00                                       0.00

-- Raw Trace File --

'db file scattered read' ela= 49 file#=917 block#=10368 blocks=8 obj#=2260477 (Index TEST_TAB#I1)

'db file scattered read' ela= 27 file#=917 block#=128 blocks=8 obj#=2260476 (Table TEST_TAB)

'db file scattered read' ela= 21 file#=917 block#=136 blocks=8 obj#=2260476

'db file parallel read' ela= 422 files=1 blocks=127 requests=127 obj#=2260476

'db file parallel read' ela= 334 files=1 blocks=127 requests=127 obj#=2260476

'db file scattered read' ela= 264 file#=917 block#=409 blocks=66 obj#=2260476

Look Raw Trace file, first 3 lines are db file scattered read with blocks=8 (one of which is to read index

TEST TAB#I1), then 2 lines of db file parallel read with both blocks=127 and requests=127, last line is

one db file scattered read with blocks=66. In total, we made 3*8 + 2*127 + 66 = 344 disk reads in 258 read requests.

Dtrace Ouput shows more details of OS calls:

 

PROBEFUNC                       FD                           RETURN_SIZE                     COUNT

pread                                    260                        540672                                1

lseek                                      260                        0                                             2

readv                                     260                        65536                                   3

pread                                    260                        8192                                      254

PROBEFUNC                       FD                           MAX_READ_Blocks

Readv                                    260                        8

pread                                    260                        66

TOTAL_SIZE = 2818048 , TOTAL_READ_Blocks = 344 , TOTAL_READ_CNT = 258

readv                                     260

value -------------------------------- Distribution -------------------------------            count

8192 |                                                                                                                                   0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@          2

32768 |@@@@@@@@@@@@@                                                                       1

65536 |                                                                                                                                0

pread                                                                                                                                    260

value -------------------------------- Distribution --------------------------------           count

2048 |                                                                                                                                   0

4096 |@@@@@@@@@@@@@@@@@@@@@@@@@                      196

8192 |@@@@@@@@                                                                                                52

16384 |                                                                                                                                2

32768 |                                                                                                                                0

65536 |@                                                                                                                           4

131072 |                                                                                                                              1

262144 |                                                                                                                             0

Crosschecking Sql Raw Trace with Dtrace, we can see:

3 blocks=8 db file scattered read are implemented by 3 readv with RETURN SIZE=65536 each.

2 blocks=127 db file parallel read are satisfied by 254 pread with RETURN SIZE=8192 each.

1 blocks=66 db file scattered read is done by 1 pread with RETURN SIZE=540672(=66*8192).

In total, we read 344 DB blocks by 258 (=3+254+1) OS read calls.

The last db file scattered read with blocks=66 also shows that one pread can read 66 blocks, much higher than db file multiblock read count=32 configured in this database.

Since 66 is not divisible by 32, it is probably an OS disk read optimization (disk read merging) for Oracle "Batched" reads, which is visible in xplan as "table access by index rowid batched".

Such kind of pread is triggered after low level OS optimization, that is probably why db file multiblock read count=32 has no effect there.

"Batched" reads is controlled by Oracle 12c hidden parameter optimizer batch table access by rowid (enable table access by ROWID IO batching), or 11g nlj batching enabled (enable batching of the RHS IO in NLJ).

For example, "Batched" can be disabled by:

SELECT /*+ index(t test_tab#i1) opt_param('_optimizer_batch_table_access_by_rowid', 'false') parallel_read */ MAX(Y)

FROM TEST_TAB T WHERE X BETWEEN 1 AND :B1;

In xplan, db file parallel read is indicated with Times Waited being 2, but real OS calls are 254 pread

requests. We will discuss it later on AIO read.

By the way, we have 3 readv, but only 2 lseek, so there are probably 2 readv share one lseek.

From above Sql Trace output and Dtrace output, we can see that pread can ful_ll both db file parallel

read and db files cattered read. Back to Test-1 Single Read, in which db file sequential read is also

performed by pread, we can say pread is universal for all 3 types of db file reads.

 

Test-4 Full Read

As the last test, we read 333 rows by a full table scan. Here the Sql Trace output including its raw trace

lines:

SQL > exec db_file_read_test('full', 1, 333);

SELECT /*+ full_read */ MAX(Y) FROM TEST_TAB T WHERE ROWNUM <= :B1

call                         count                     cpu                         elapsed                               disk                        query                    current                 rows

-------                     ------                      --------                  ----------                                ----------                ----------                ----------                 ----------

Parse                     1                             0.00                       0.00                                       0                             0                             0                             0

Execute                1                             0.00                       0.00                                       0                            0                             0                             0

Fetch                     1                             0.00                       0.00                                       342                        342                        3                             1

-------                     ------                      --------                  ----------                                ----------                ----------                ----------                 ----------

total                       3                             0.00                       0.00                                       342                        342                        3                             1

Row Source Operation

---------------------------------------------------

SORT AGGREGATE (cr=342 pr=342 pw=0 time=3788 us)

COUNT STOPKEY (cr=342 pr=342 pw=0 time=1371 us)

TABLE ACCESS FULL TEST_TAB (cr=342 pr=342 pw=0 time=925 us cost=99 size=1169334 card=334)

Event waited on                                   Times                                    Max. Wait        Total Waited

---------------------------------------- Waited------------------    ----------      ------------

db file sequential read    2                                                            0.00       0.00

db file scattered read                      23                                                           0.00       0.00

-- Raw Trace File --

'db file sequential read' ela= 19 file#=917 block#=130 blocks=1 obj#=2260476 tim=647299975335

'db file sequential read' ela= 14 file#=3 block#=768 blocks=1 obj#=0 tim=647299975399

-- UNDO file#=3 /oratestdb/oradata/testdb/undo01.dbf

'db file scattered read' ela= 22 file#=917 block#=131 blocks=5 obj#=2260476 tim=647299975501

'db file scattered read' ela= 25 file#=917 block#=136 blocks=8 obj#=2260476 tim=647299975609

'db file scattered read' ela= 25 file#=917 block#=145 blocks=7 obj#=2260476 tim=647299975713

'db file scattered read' ela= 23 file#=917 block#=152 blocks=8 obj#=2260476 tim=647299975806

'db file scattered read' ela= 25 file#=917 block#=161 blocks=7 obj#=2260476 tim=647299975901

'db file scattered read' ela= 24 file#=917 block#=168 blocks=8 obj#=2260476 tim=647299975994

'db file scattered read' ela= 23 file#=917 block#=177 blocks=7 obj#=2260476 tim=647299976088

'db file scattered read' ela= 23 file#=917 block#=184 blocks=8 obj#=2260476 tim=647299976178

'db file scattered read' ela= 23 file#=917 block#=193 blocks=7 obj#=2260476 tim=647299976270

'db file scattered read' ela= 22 file#=917 block#=200 blocks=8 obj#=2260476 tim=647299976364

'db file scattered read' ela= 23 file#=917 block#=209 blocks=7 obj#=2260476 tim=647299976465

'db file scattered read' ela= 22 file#=917 block#=216 blocks=8 obj#=2260476 tim=647299976554

'db file scattered read' ela= 22 file#=917 block#=225 blocks=7 obj#=2260476 tim=647299976646

'db file scattered read' ela= 29 file#=917 block#=232 blocks=8 obj#=2260476 tim=647299976759

'db file scattered read' ela= 24 file#=917 block#=241 blocks=7 obj#=2260476 tim=647299976866

'db file scattered read' ela= 23 file#=917 block#=248 blocks=8 obj#=2260476 tim=647299976956

'db file scattered read' ela= 128 file#=917 block#=258 blocks=32 obj#=2260476 tim=647299977200

'db file scattered read' ela= 95 file#=917 block#=290 blocks=32 obj#=2260476 tim=647299977511

'db file scattered read' ela= 97 file#=917 block#=322 blocks=32 obj#=2260476 tim=647299977822

'db file scattered read' ela= 87 file#=917 block#=354 blocks=30 obj#=2260476 tim=647299978113

'db file scattered read' ela= 96 file#=917 block#=386 blocks=32 obj#=2260476 tim=647299978407

'db file scattered read' ela= 108 file#=917 block#=418 blocks=32 obj#=2260476 tim=647299978719

'db file scattered read' ela= 94 file#=917 block#=450 blocks=32 obj#=2260476 tim=647299979021.

 

We made 2 db file sequential read with blocks=1, and 23 db file scattered read with blocks varied from 5 to 32, but summing up them together,

it is 342 disk reads. In this case, the maximum blocks=32 is probably dictated by db file multiblock read count.

Note that the second db file sequential read is to read undo data block (file#=3 block#=768 obj#=0).

It is visible in Raw Trace, and counted in xplan statistics.

Look again Dtrace output:

PROBEFUNC                       FD                          RETURN_SIZE                                     COUNT

Pread                                    260                       8192                                                     1

pread                                    260                        245760                                                1

readv                                     260                        40960                                                   1

pread                                    260                        262144                                                6

readv                                     260                        57344                                                   7

lseek                                      260                        0                                                             8

readv                                     260                        65536                                                    8

PROBEFUNC                       FD                           MAX_READ_Blocks

readv                                     260                        8

pread                                    260                        32

TOTAL_SIZE = 2793472 , TOTAL_READ_Blocks = 341 , TOTAL_READ_CNT = 24

readv                                     260

value ----------------------------- Distribution --------------------------        count

8192 |                                                                                                                                   0

16384 |@@@@@@@@@@@@@@@@@@                                 16

32768 |                                                                                                                                0

pread                                                                                                                                    260

value ----------------------------- Distribution -----------------------------   count

16384 |                                                                                                                                0

32768 |@@@@@@@@@@@@@@@@@@@@                         4

65536 |@@@@@@@@@@@@@@@                                                              3

131072 |@@@@@                                                                                                        1

262144 |                                                                                                                             0

 

Crosscheck Sql Raw Trace with Dtrace, we can see:

 

1 blocks=1 db file sequential read is implemented by 1 pread with RETURN SIZE=8192.

1 blocks=5 db file scattered read is implemented by 1 readv with RETURN SIZE=40960.

7 blocks=7 db file scattered read is implemented by 7 readv with RETURN SIZE=57344.

8 blocks=8 db file scattered read is implemented by 8 readv with RETURN SIZE=65536.

1 blocks=30 db file scattered read is implemented by 1 pread with RETURN SIZE=245760.

6 blocks=32 db file scattered read is implemented by 6 pread with RETURN SIZE=262144.

 

In total, we read 341 DB blocks by 24 OS read calls.

 

Sql Trace showed 25 (=23+2) reads to get 342 blocks, Dtrace showed 24 reads to get 341 blocks, one extra

read in Sql Trace is due to one undo read (file#=3). (Note: in Dtrace script, we only trace FD=260,

that is file#=917, undo file#=3 is not traced)

If we display segment extent allocations by two queries below:

 

SQL > select segment_type, segment_subtype, header_block, blocks, extents, initial_extent, next_extent from dba_segments v where segment_name = 'TEST_TAB';

 

SEGMENT_TYPE                SEGMENT_SU       HEADER_BLOCK             BLOCKS                 EXTENTS              INITIAL_EXTENT                 NEXT_EXTENT

-------------                            ----------                  ------------                           ------                      -------                     --------------                          -----------

TABLE                                    ASSM                       130                                     10240                   81                          65536                                   1048576

SQL > select blocks, count(*) cnt, min(extent_id), min(block_id) from dba_extents where segment_name = 'TEST_TAB' group by blocks order by min(extent_id);

BLOCKS CNT       MIN(EXTENT_ID)              MIN(BLOCK_ID)

------                      ---          --------------                          -------------

8                             16           0                                             128

128                        63           16                                           256

1024                      2             79                                           8320

and then crosscheck with above raw trace file again. We can see that the first db file sequential read (block#=130) by pread is to read segment header block (HEADER BLOCK: 130), the next 16 db file scattered

read with blocks between 5 and 8 by readv is to read all 16 initial extents (8 blocks per extent), the rest 7 db file scattered read with blocks between 30 and 32 by pread is to read incremental extents (128 blocks per extent).

The size of incremental extents is 128 block, but each scattered read can read maximum 32 blocks (db file multiblock read count=32).

Alireza Kamrani

 

 

 

 

 

 

 

Oracle Hot Block Concept, Detection and Resolving scenarios

  1. What is a “Hot Block” in Oracle Database ? How to detect and resolve it in Oracle Database? When your Oracle database slows down, the c...