Database-Box

Oracle Data access methods in Action

Alireza Kamrani 20 Jul 2025

Data Accesses

To use the data stored in database, applications need to expose it or retrieve it. That is the task of data accesses provided by Oracle.

Based on data locality, they are differentiated as physical read and logical read. Logical read is further divided into consistent gets and current gets.

Physical Read

To access any data, firstly Oracle has to move it from disk (persistent mass storage) to memory (volatile main storage), that is, physical read (disk read, db file read, cold read).

Oracle provides 3 basic approaches of db fille read:

(a). db fille sequential read

(b). db file scattered read

(c). db file parallel read

In this Section, we will look into different access paths and investigate their executions with tools like:

(1). Sql Trace

(2). Dtrace

(3). Oracle View: v$filestat and v$iostat_file

Note: All tests are done in Oracle 12.1.0.2 on Solaris.

In the following test, we create a table and one index on it, each row occupies about 1 DB block (db block size = 8192).

Full test code is appended at the end of this section.

SQL> create table test_tab tablespace test_ts as

select level x, rpad('ABC', 3500, 'X') y, rpad('ABC', 3500, 'X') z from dual connect by level <= 1e4;

SQL> create index test_tab#i1 on test_tab(x) tablespace test_ts;

SQL> select round(bytes/1024/1024) mb, blocks from dba_segments where segment_name = 'TEST_TAB';

--80 10240

DB File Read Access Path

We will run 4 variants of access path tests, and measure their performance in Sql Trace (event 10046) and Dtrace (see appended Dtrace Script).

Test-1 Single Read

As the first test, we will select 333 adjacent rows by rowid. Here the Sql Trace output:

SQL > exec db_file_read_test('single', 1, 333);

-- adjacent rowid, single block read, 'db file sequential read'

SELECT /*+ single_read */ Y FROM TEST_TAB T WHERE ROWID = :B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 333 0.01 0.01 0 0 0 0

Fetch 333 0.01 0.01 641 333 0 333

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 667 0.02 0.02 641 333 0 333

Row Source Operation

---------------------------------------------------

TABLE ACCESS BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=160 us cost=1 size=3513 card=1)

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited ---------- ------------

db file scattered read 44 0.00 0.00

db file sequential read 289 0.00 0.00

To read 333 rows, we perform 44 scattered read and 289 sequential reads, in total, 333 reads.

However, 641 blocks are read into memory because of scattered read.

Dtrace output reveals more details about lower OS layer calls:

PROBEFUNC FD RETURN_SIZE COUNT

lseek 260 0 44

readv 260 65536 44

pread 260 8192 289

PROBEFUNC FD MAX_READ_Blocks

pread 260 1

readv 260 8

TOTAL_SIZE = 5251072 , TOTAL_READ_Blocks = 641 , TOTAL_READ_CNT = 333

readv 260

value ------------------------------------------ Distribution ---------------------------------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 43

32768 |@ 1

65536 | 0

pread 260

value ------------------------------------------ Distribution ------------------------------------------ count

2048 | 0

4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 226

8192 |@@@@@@@@ 60

16384 | 2

32768 | 0

65536 | 1

131072 | 0

44 scattered read are ful_lled by 44 readv from filedescriptor FD: 260 (which are proceeded by 44 lseek), each of which fetches 65536 bytes (8 DB blocks).

289 sequential read are done by 289 pread, each of which fetches 8192 bytes (1 DB block).

Totally we read 641 DB blocks in 333 read OS calls.

Now we look the Dtrace quantize (frequency distribution diagram) output, in which the values in all lines

are always increased by power-of-two in nanoseconds. Each line indicates the count of the number of

elements greater than or equal to the corresponding value, but less than the next larger row value. It is

similar to Oracle Wait Event Histogram (for instance, v$event histogram).

The whole elapsed time (multiplied by 1.5 to get average value) can be estimated as:

readv: (16384*43 + 32768*1)*1.5 = 1105920

pread: (4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5 = 2273280

total: 1105920 + 2273280 = 3379200

The total elapse time of 3 millisecond (3379200 ns) in Dtrace is much less than xplan 20 millisecond (0.02 second) since Dtrace only collects time of OS IO activities, the other 17 ms could be consumed in the

DB side. For example, in the above xplan, execute phase took 10 millisecond (0.01 second), whereas two Wait Events there: db file scattered read and db file sequential read having Total Waited equal to 0.00

(the minimum time unit in xplan is centisecond, which seems inherited from old Oracle hundredths of a second counting).

We can also compare elapsed time per block read for readv (8 blocks per read request), and pread (1 block per read request),

thereby evaluate the exact performance difference between single block read and multi block read.

The result shows that readv is 2 times faster than pread per block read.

readv: (16384*43 + 32768*1)*1.5/8/44 = 3142

pread: (4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5/289 = 7866

Test-2 Scattered Read

In the second test, we also select 333 rows by rowid. Instead of adjacent rows, we read one row after

skipping 10 rows (see appended Test Code).

Here the Sql Trace output:

SQL > exec db_file_read_test('scattered', 1, 333);

-- jumped rowid, scattered read, 'db file scattered read'

SELECT /*+ scattered_read */ Y FROM TEST_TAB T WHERE ROWID =: B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 333 0.00 0.00 0 0 0 0

Fetch 333 0.02 0.02 2664 333 0 333

------- ------ -------- ---------- ---------- ----------- ---------- ---------

total 667 0.02 0.02 2664 333 0 333

Row Source Operation

---------------------------------------------------

TABLE ACCESS BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=156 us cost=1 size=3513 card=1)

Elapsed times include waiting on following events:

Event waited Times Max. Wait Total Waited

--------------------------- Waited ---------- ------------

db file scattered read 333 0.00 0.00

Oracle chooses db file scattered read to fetch all 333 rows with 2664 disk reads.

But xplan looks identical as single read, so xplan alone is not able to reveal the difference.

But Dtrace output shows the di_erence:

------------------------------ dtrace ------------------------------

PROBEFUNC FD RETURN_SIZE COUNT

Lseek 260 0 91

readv 260 65536 333

PROBEFUNC FD MAX_READ_Blocks

Readv 260 8

TOTAL_SIZE = 21823488 , TOTAL_READ_Blocks = 2664 , TOTAL_READ_CNT = 333

readv 260

value ------------- Distribution ------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 290

32768 |@@@@@ 43

65536 | 0

Each readv request returns 8 DB blocks, 333 readv accumulated to exactly 333 x 8 = 2664.

In other hand, 91 lseek moving probably indicates that most of blocks are located next to each other.

Test-3 Parallel Read

In the next test, we read 333 rows by index range scan. Sql Trace shows the third type of db file read:

db file parallel read. In the output, we also include part of Raw Trace file.

SQL > exec db_file_read_test('parallel', 1, 333);

SELECT /*+ index(t test_tab#i1) parallel_read */ MAX(Y) FROM TEST_TAB T WHERE X BETWEEN 1 AND :B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 1 0.00 0.00 344 335 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 3 0.00 0.00 344 335 0 1

Row Source Operation

---------------------------------------------------

SORT AGGREGATE (cr=335 pr=344 pw=0 time=3760 us)

FILTER (cr=335 pr=344 pw=0 time=1698 us)

TABLE ACCESS BY INDEX ROWID BATCHED TEST_TAB (cr=335 pr=344 pw=0 time=1361 us cost=168 size=1167165 card=333)

INDEX RANGE SCAN TEST_TAB#I1 (cr=2 pr=8 pw=0 time=279 us cost=1 size=0 card=333)(object id 2260477)

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited------------ ---------- ------------

db file scattered read 4 0.00 0.00

db file parallel read 2 0.00 0.00

-- Raw Trace File --

'db file scattered read' ela= 49 file#=917 block#=10368 blocks=8 obj#=2260477 (Index TEST_TAB#I1)

'db file scattered read' ela= 27 file#=917 block#=128 blocks=8 obj#=2260476 (Table TEST_TAB)

'db file scattered read' ela= 21 file#=917 block#=136 blocks=8 obj#=2260476

'db file parallel read' ela= 422 files=1 blocks=127 requests=127 obj#=2260476

'db file parallel read' ela= 334 files=1 blocks=127 requests=127 obj#=2260476

'db file scattered read' ela= 264 file#=917 block#=409 blocks=66 obj#=2260476

Look Raw Trace file, first 3 lines are db file scattered read with blocks=8 (one of which is to read index

TEST TAB#I1), then 2 lines of db file parallel read with both blocks=127 and requests=127, last line is

one db file scattered read with blocks=66. In total, we made 3*8 + 2*127 + 66 = 344 disk reads in 258 read requests.

Dtrace Ouput shows more details of OS calls:

PROBEFUNC FD RETURN_SIZE COUNT

pread 260 540672 1

lseek 260 0 2

readv 260 65536 3

pread 260 8192 254

PROBEFUNC FD MAX_READ_Blocks

Readv 260 8

pread 260 66

TOTAL_SIZE = 2818048 , TOTAL_READ_Blocks = 344 , TOTAL_READ_CNT = 258

readv 260

value -------------------------------- Distribution ------------------------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2

32768 |@@@@@@@@@@@@@ 1

65536 | 0

pread 260

value -------------------------------- Distribution -------------------------------- count

2048 | 0

4096 |@@@@@@@@@@@@@@@@@@@@@@@@@ 196

8192 |@@@@@@@@ 52

16384 | 2

32768 | 0

65536 |@ 4

131072 | 1

262144 | 0

Crosschecking Sql Raw Trace with Dtrace, we can see:

3 blocks=8 db file scattered read are implemented by 3 readv with RETURN SIZE=65536 each.

2 blocks=127 db file parallel read are satisfied by 254 pread with RETURN SIZE=8192 each.

1 blocks=66 db file scattered read is done by 1 pread with RETURN SIZE=540672(=66*8192).

In total, we read 344 DB blocks by 258 (=3+254+1) OS read calls.

The last db file scattered read with blocks=66 also shows that one pread can read 66 blocks, much higher than db file multiblock read count=32 configured in this database.

Since 66 is not divisible by 32, it is probably an OS disk read optimization (disk read merging) for Oracle "Batched" reads, which is visible in xplan as "table access by index rowid batched".

Such kind of pread is triggered after low level OS optimization, that is probably why db file multiblock read count=32 has no effect there.

"Batched" reads is controlled by Oracle 12c hidden parameter optimizer batch table access by rowid (enable table access by ROWID IO batching), or 11g nlj batching enabled (enable batching of the RHS IO in NLJ).

For example, "Batched" can be disabled by:

SELECT /*+ index(t test_tab#i1) opt_param('_optimizer_batch_table_access_by_rowid', 'false') parallel_read */ MAX(Y)

FROM TEST_TAB T WHERE X BETWEEN 1 AND :B1;

In xplan, db file parallel read is indicated with Times Waited being 2, but real OS calls are 254 pread

requests. We will discuss it later on AIO read.

By the way, we have 3 readv, but only 2 lseek, so there are probably 2 readv share one lseek.

From above Sql Trace output and Dtrace output, we can see that pread can ful_ll both db file parallel

read and db files cattered read. Back to Test-1 Single Read, in which db file sequential read is also

performed by pread, we can say pread is universal for all 3 types of db file reads.

Test-4 Full Read

As the last test, we read 333 rows by a full table scan. Here the Sql Trace output including its raw trace

lines:

SQL > exec db_file_read_test('full', 1, 333);

SELECT /*+ full_read */ MAX(Y) FROM TEST_TAB T WHERE ROWNUM <= :B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 1 0.00 0.00 342 342 3 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 3 0.00 0.00 342 342 3 1

Row Source Operation

---------------------------------------------------

SORT AGGREGATE (cr=342 pr=342 pw=0 time=3788 us)

COUNT STOPKEY (cr=342 pr=342 pw=0 time=1371 us)

TABLE ACCESS FULL TEST_TAB (cr=342 pr=342 pw=0 time=925 us cost=99 size=1169334 card=334)

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited------------------ ---------- ------------

db file sequential read 2 0.00 0.00

db file scattered read 23 0.00 0.00

-- Raw Trace File --

'db file sequential read' ela= 19 file#=917 block#=130 blocks=1 obj#=2260476 tim=647299975335

'db file sequential read' ela= 14 file#=3 block#=768 blocks=1 obj#=0 tim=647299975399

-- UNDO file#=3 /oratestdb/oradata/testdb/undo01.dbf

'db file scattered read' ela= 22 file#=917 block#=131 blocks=5 obj#=2260476 tim=647299975501

'db file scattered read' ela= 25 file#=917 block#=136 blocks=8 obj#=2260476 tim=647299975609

'db file scattered read' ela= 25 file#=917 block#=145 blocks=7 obj#=2260476 tim=647299975713

'db file scattered read' ela= 23 file#=917 block#=152 blocks=8 obj#=2260476 tim=647299975806

'db file scattered read' ela= 25 file#=917 block#=161 blocks=7 obj#=2260476 tim=647299975901

'db file scattered read' ela= 24 file#=917 block#=168 blocks=8 obj#=2260476 tim=647299975994

'db file scattered read' ela= 23 file#=917 block#=177 blocks=7 obj#=2260476 tim=647299976088

'db file scattered read' ela= 23 file#=917 block#=184 blocks=8 obj#=2260476 tim=647299976178

'db file scattered read' ela= 23 file#=917 block#=193 blocks=7 obj#=2260476 tim=647299976270

'db file scattered read' ela= 22 file#=917 block#=200 blocks=8 obj#=2260476 tim=647299976364

'db file scattered read' ela= 23 file#=917 block#=209 blocks=7 obj#=2260476 tim=647299976465

'db file scattered read' ela= 22 file#=917 block#=216 blocks=8 obj#=2260476 tim=647299976554

'db file scattered read' ela= 22 file#=917 block#=225 blocks=7 obj#=2260476 tim=647299976646

'db file scattered read' ela= 29 file#=917 block#=232 blocks=8 obj#=2260476 tim=647299976759

'db file scattered read' ela= 24 file#=917 block#=241 blocks=7 obj#=2260476 tim=647299976866

'db file scattered read' ela= 23 file#=917 block#=248 blocks=8 obj#=2260476 tim=647299976956

'db file scattered read' ela= 128 file#=917 block#=258 blocks=32 obj#=2260476 tim=647299977200

'db file scattered read' ela= 95 file#=917 block#=290 blocks=32 obj#=2260476 tim=647299977511

'db file scattered read' ela= 97 file#=917 block#=322 blocks=32 obj#=2260476 tim=647299977822

'db file scattered read' ela= 87 file#=917 block#=354 blocks=30 obj#=2260476 tim=647299978113

'db file scattered read' ela= 96 file#=917 block#=386 blocks=32 obj#=2260476 tim=647299978407

'db file scattered read' ela= 108 file#=917 block#=418 blocks=32 obj#=2260476 tim=647299978719

'db file scattered read' ela= 94 file#=917 block#=450 blocks=32 obj#=2260476 tim=647299979021.

We made 2 db file sequential read with blocks=1, and 23 db file scattered read with blocks varied from 5 to 32, but summing up them together,

it is 342 disk reads. In this case, the maximum blocks=32 is probably dictated by db file multiblock read count.

Note that the second db file sequential read is to read undo data block (file#=3 block#=768 obj#=0).

It is visible in Raw Trace, and counted in xplan statistics.

Look again Dtrace output:

PROBEFUNC FD RETURN_SIZE COUNT

Pread 260 8192 1

pread 260 245760 1

readv 260 40960 1

pread 260 262144 6

readv 260 57344 7

lseek 260 0 8

readv 260 65536 8

PROBEFUNC FD MAX_READ_Blocks

readv 260 8

pread 260 32

TOTAL_SIZE = 2793472 , TOTAL_READ_Blocks = 341 , TOTAL_READ_CNT = 24

readv 260

value ----------------------------- Distribution -------------------------- count

8192 | 0

16384 |@@@@@@@@@@@@@@@@@@ 16

32768 | 0

pread 260

value ----------------------------- Distribution ----------------------------- count

16384 | 0

32768 |@@@@@@@@@@@@@@@@@@@@ 4

65536 |@@@@@@@@@@@@@@@ 3

131072 |@@@@@ 1

262144 | 0

Crosscheck Sql Raw Trace with Dtrace, we can see:

1 blocks=1 db file sequential read is implemented by 1 pread with RETURN SIZE=8192.

1 blocks=5 db file scattered read is implemented by 1 readv with RETURN SIZE=40960.

7 blocks=7 db file scattered read is implemented by 7 readv with RETURN SIZE=57344.

8 blocks=8 db file scattered read is implemented by 8 readv with RETURN SIZE=65536.

1 blocks=30 db file scattered read is implemented by 1 pread with RETURN SIZE=245760.

6 blocks=32 db file scattered read is implemented by 6 pread with RETURN SIZE=262144.

In total, we read 341 DB blocks by 24 OS read calls.

Sql Trace showed 25 (=23+2) reads to get 342 blocks, Dtrace showed 24 reads to get 341 blocks, one extra

read in Sql Trace is due to one undo read (file#=3). (Note: in Dtrace script, we only trace FD=260,

that is file#=917, undo file#=3 is not traced)

If we display segment extent allocations by two queries below:

SQL > select segment_type, segment_subtype, header_block, blocks, extents, initial_extent, next_extent from dba_segments v where segment_name = 'TEST_TAB';

SEGMENT_TYPE SEGMENT_SU HEADER_BLOCK BLOCKS EXTENTS INITIAL_EXTENT NEXT_EXTENT

------------- ---------- ------------ ------ ------- -------------- -----------

TABLE ASSM 130 10240 81 65536 1048576

SQL > select blocks, count(*) cnt, min(extent_id), min(block_id) from dba_extents where segment_name = 'TEST_TAB' group by blocks order by min(extent_id);

BLOCKS CNT MIN(EXTENT_ID) MIN(BLOCK_ID)

------ --- -------------- -------------

8 16 0 128

128 63 16 256

1024 2 79 8320

and then crosscheck with above raw trace file again. We can see that the first db file sequential read (block#=130) by pread is to read segment header block (HEADER BLOCK: 130), the next 16 db file scattered

read with blocks between 5 and 8 by readv is to read all 16 initial extents (8 blocks per extent), the rest 7 db file scattered read with blocks between 30 and 32 by pread is to read incremental extents (128 blocks per extent).

The size of incremental extents is 128 block, but each scattered read can read maximum 32 blocks (db file multiblock read count=32).

Alireza Kamrani

Database-Box

Sunday, July 20, 2025

Oracle Data Access Methods in Action

Oracle Data Access Methods in Action

Report Abuse

Labels