Oracle Data access methods in Action
Alireza
Kamrani 20 Jul 2025
Data Accesses
To use the
data stored in database, applications need to expose it or retrieve it. That is
the task of data accesses provided by Oracle.
Based on
data locality, they are differentiated as physical read and logical read.
Logical read is further divided into consistent gets and current gets.
Physical
Read
To access
any data, firstly Oracle has to move it from disk (persistent mass storage) to
memory (volatile main storage), that is, physical read (disk read, db file
read, cold read).
Oracle
provides 3 basic approaches of db fille read:
(a). db fille sequential read
(b). db file scattered read
(c). db file parallel read
In this
Section, we will look into different access paths and investigate their
executions with tools like:
(1). Sql Trace
(2). Dtrace
(3). Oracle View: v$filestat and v$iostat_file
Note: All
tests are done in Oracle 12.1.0.2 on Solaris.
In the following
test, we create a table and one index on it, each row occupies about 1 DB block
(db block size = 8192).
Full test
code is appended at the end of this section.
SQL> create
table test_tab tablespace test_ts as
select level
x, rpad('ABC', 3500, 'X') y, rpad('ABC', 3500, 'X') z from dual connect by
level <= 1e4;
SQL> create
index test_tab#i1 on test_tab(x) tablespace test_ts;
SQL> select
round(bytes/1024/1024) mb, blocks from dba_segments where segment_name =
'TEST_TAB';
--80 10240
DB File
Read Access Path
We will run
4 variants of access path tests, and measure their performance in Sql Trace
(event 10046) and Dtrace (see appended Dtrace Script).
Test-1
Single Read
As the first
test, we will select 333 adjacent rows by rowid. Here the Sql Trace output:
SQL >
exec db_file_read_test('single', 1, 333);
-- adjacent
rowid, single block read, 'db file sequential read'
SELECT /*+
single_read */ Y FROM TEST_TAB T WHERE ROWID = :B1
call count
cpu
elapsed disk query current rows
------- ------
-------- ---------- ----------
---------- ---------- ----------
Parse 1
0.00 0.00 0 0 0
0
Execute 333 0.01 0.01 0 0
0 0
Fetch 333 0.01 0.01 641 333 0 333
------- ------ -------- ---------- ---------- ---------- ----------
----------
total 667 0.02 0.02 641
333 0 333
Row Source
Operation
---------------------------------------------------
TABLE ACCESS
BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=160 us cost=1 size=3513 card=1)
Event waited
on Times Max. Wait Total Waited
----------------------------------------
Waited ---------- ------------
db file
scattered read 44 0.00
0.00
db file
sequential read 289 0.00
0.00
To read 333
rows, we perform 44 scattered read and 289 sequential reads, in total, 333
reads.
However, 641
blocks are read into memory because of scattered read.
Dtrace
output reveals more details about lower OS layer calls:
PROBEFUNC FD
RETURN_SIZE COUNT
lseek 260
0 44
readv 260
65536 44
pread 260
8192 289
PROBEFUNC FD
MAX_READ_Blocks
pread 260 1
readv 260
8
TOTAL_SIZE = 5251072 , TOTAL_READ_Blocks = 641 ,
TOTAL_READ_CNT = 333
readv 260
value ------------------------------------------
Distribution ---------------------------------------- count
8192 |
0
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 43
32768 |@ 1
65536 | 0
pread 260
value ------------------------------------------ Distribution
------------------------------------------ count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 226
8192 |@@@@@@@@ 60
16384 | 2
32768 | 0
65536 | 1
131072 | 0
44 scattered
read are ful_lled by 44 readv from filedescriptor FD: 260 (which are proceeded
by 44 lseek), each of which fetches 65536 bytes (8 DB blocks).
289
sequential read are done by 289 pread, each of which fetches 8192 bytes
(1 DB block).
Totally we
read 641 DB blocks in 333 read OS calls.
Now we look
the Dtrace quantize (frequency distribution diagram) output, in which the
values in all lines
are always
increased by power-of-two in nanoseconds. Each line indicates the count of the
number of
elements
greater than or equal to the corresponding value, but less than the next larger
row value. It is
similar to
Oracle Wait Event Histogram (for instance, v$event histogram).
The whole
elapsed time (multiplied by 1.5 to get average value) can be estimated as:
readv:
(16384*43 + 32768*1)*1.5 = 1105920
pread:
(4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5 = 2273280
total:
1105920 + 2273280 = 3379200
The total
elapse time of 3 millisecond (3379200 ns) in Dtrace is much less than xplan 20
millisecond (0.02 second) since Dtrace only collects time of OS IO activities,
the other 17 ms could be consumed in the
DB side. For
example, in the above xplan, execute phase took 10 millisecond (0.01 second),
whereas two Wait Events there: db file scattered read and db file sequential
read having Total Waited equal to 0.00
(the minimum
time unit in xplan is centisecond, which seems inherited from old Oracle
hundredths of a second counting).
We can also
compare elapsed time per block read for readv (8 blocks per read request), and pread
(1 block per read request),
thereby evaluate the exact performance difference
between single block read and multi block read.
The result
shows that readv is 2 times faster than pread per block read.
readv:
(16384*43 + 32768*1)*1.5/8/44 = 3142
pread:
(4096*226 + 8192*60 + 16384*2 + 65536*1)*1.5/289 = 7866
Test-2
Scattered Read
In the
second test, we also select 333 rows by rowid. Instead of adjacent rows, we
read one row after
skipping 10
rows (see appended Test Code).
Here the Sql
Trace output:
SQL >
exec db_file_read_test('scattered', 1, 333);
-- jumped
rowid, scattered read, 'db file scattered read'
SELECT /*+
scattered_read */ Y FROM TEST_TAB T WHERE ROWID =: B1
call count cpu elapsed
disk query current
rows
------- ------
-------- ---------- ----------
---------- ---------- ----------
Parse 1
0.00 0.00 0 0
0 0
Execute 333 0.00 0.00 0 0 0 0
Fetch 333 0.02 0.02 2664 333
0 333
------- ------ --------
---------- ---------- ----------- ---------- ---------
total 667 0.02 0.02
2664 333 0 333
Row Source
Operation
---------------------------------------------------
TABLE ACCESS
BY USER ROWID TEST_TAB (cr=1 pr=8 pw=0 time=156 us cost=1 size=3513 card=1)
Elapsed
times include waiting on following events:
Event waited Times Max. Wait Total
Waited
--------------------------- Waited ---------- ------------
db file scattered read 333
0.00 0.00
Oracle
chooses db file scattered read to fetch all 333 rows with 2664 disk reads.
But xplan looks identical as single read, so
xplan alone is not able to reveal the difference.
But Dtrace output shows the di_erence:
------------------------------ dtrace
------------------------------
PROBEFUNC FD RETURN_SIZE COUNT
Lseek 260 0 91
readv 260
65536
333
PROBEFUNC FD
MAX_READ_Blocks
Readv 260 8
TOTAL_SIZE = 21823488 , TOTAL_READ_Blocks = 2664 ,
TOTAL_READ_CNT = 333
readv 260
value ------------- Distribution ------------- count
8192 | 0
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 290
32768 |@@@@@ 43
65536 | 0
Each readv request
returns 8 DB blocks, 333 readv accumulated to exactly 333 x 8 = 2664.
In other hand,
91 lseek moving probably indicates that most of blocks are located next to each
other.
Test-3
Parallel Read
In the next
test, we read 333 rows by index range scan. Sql Trace shows the third type of
db file read:
db file parallel
read. In the output, we also include part of Raw Trace file.
SQL > exec db_file_read_test('parallel', 1, 333);
SELECT /*+ index(t test_tab#i1) parallel_read */ MAX(Y) FROM
TEST_TAB T WHERE X BETWEEN 1 AND :B1
call count
cpu elapsed disk query current rows
------- ------
-------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0
0
Execute 1
0.00 0.00 0 0
0 0
Fetch 1
0.00 0.00 344 335 0 1
------- ------
-------- ---------- ---------- ---------- ---------- ----------
total 3
0.00 0.00 344 335 0 1
Row Source
Operation
---------------------------------------------------
SORT
AGGREGATE (cr=335 pr=344 pw=0 time=3760 us)
FILTER
(cr=335 pr=344 pw=0 time=1698 us)
TABLE ACCESS
BY INDEX ROWID BATCHED TEST_TAB (cr=335 pr=344 pw=0 time=1361 us cost=168
size=1167165 card=333)
INDEX RANGE
SCAN TEST_TAB#I1 (cr=2 pr=8 pw=0 time=279 us cost=1 size=0 card=333)(object id
2260477)
Event waited
on Times
Max. Wait Total
Waited
----------------------------------------
Waited------------ ---------- ------------
db file
scattered read 4 0.00 0.00
db file
parallel read 2
0.00 0.00
-- Raw Trace
File --
'db file
scattered read' ela= 49 file#=917 block#=10368 blocks=8 obj#=2260477 (Index
TEST_TAB#I1)
'db file
scattered read' ela= 27 file#=917 block#=128 blocks=8 obj#=2260476 (Table
TEST_TAB)
'db file
scattered read' ela= 21 file#=917 block#=136 blocks=8 obj#=2260476
'db file
parallel read' ela= 422 files=1 blocks=127 requests=127 obj#=2260476
'db file
parallel read' ela= 334 files=1 blocks=127 requests=127 obj#=2260476
'db file
scattered read' ela= 264 file#=917 block#=409 blocks=66 obj#=2260476
Look Raw
Trace file, first 3 lines are db file scattered read with blocks=8 (one of
which is to read index
TEST TAB#I1),
then 2 lines of db file parallel read with both blocks=127 and requests=127,
last line is
one db file
scattered read with blocks=66. In total, we made 3*8 + 2*127 + 66 = 344 disk
reads in 258 read requests.
Dtrace Ouput shows more details of OS calls:
PROBEFUNC FD
RETURN_SIZE COUNT
pread 260
540672 1
lseek 260
0 2
readv 260
65536 3
pread 260
8192
254
PROBEFUNC FD
MAX_READ_Blocks
Readv 260 8
pread 260
66
TOTAL_SIZE = 2818048 , TOTAL_READ_Blocks = 344 ,
TOTAL_READ_CNT = 258
readv 260
value -------------------------------- Distribution
------------------------------- count
8192 | 0
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2
32768 |@@@@@@@@@@@@@ 1
65536 | 0
pread 260
value -------------------------------- Distribution --------------------------------
count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@ 196
8192 |@@@@@@@@ 52
16384 | 2
32768 |
0
65536 |@
4
131072 | 1
262144 |
0
Crosschecking
Sql Raw Trace with Dtrace, we can see:
3 blocks=8
db file scattered read are implemented by 3 readv with RETURN SIZE=65536 each.
2 blocks=127
db file parallel read are satisfied by 254 pread with RETURN SIZE=8192 each.
1 blocks=66
db file scattered read is done by 1 pread with RETURN SIZE=540672(=66*8192).
In total, we
read 344 DB blocks by 258 (=3+254+1) OS read calls.
The last db file
scattered read with blocks=66 also shows that one pread can read 66 blocks,
much higher than db file multiblock read count=32 configured in this database.
Since 66 is
not divisible by 32, it is probably an OS disk read optimization (disk read
merging) for Oracle "Batched" reads, which is visible in xplan as
"table access by index rowid batched".
Such kind of
pread is triggered after low level OS optimization, that is probably why db
file multiblock read count=32 has no effect there.
"Batched"
reads is controlled by Oracle 12c hidden parameter optimizer batch table access
by rowid (enable table access by ROWID IO batching), or 11g nlj batching
enabled (enable batching of the RHS IO in NLJ).
For example,
"Batched" can be disabled by:
SELECT /*+
index(t test_tab#i1) opt_param('_optimizer_batch_table_access_by_rowid',
'false') parallel_read */ MAX(Y)
FROM
TEST_TAB T WHERE X BETWEEN 1 AND :B1;
In xplan, db
file parallel read is indicated with Times Waited being 2, but real OS calls are
254 pread
requests. We
will discuss it later on AIO read.
By the way,
we have 3 readv, but only 2 lseek, so there are probably 2 readv share one lseek.
From above
Sql Trace output and Dtrace output, we can see that pread can ful_ll both db file
parallel
read and db files
cattered read. Back to Test-1 Single Read, in which db file sequential read is
also
performed by
pread, we can say pread is universal for all 3 types of db file reads.
Test-4
Full Read
As the last
test, we read 333 rows by a full table scan. Here the Sql Trace output
including its raw trace
lines:
SQL >
exec db_file_read_test('full', 1, 333);
SELECT /*+
full_read */ MAX(Y) FROM TEST_TAB T WHERE ROWNUM <= :B1
call count
cpu elapsed disk query
current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1
0.00 0.00 0 0 0 0
Execute 1
0.00 0.00 0 0 0 0
Fetch 1
0.00 0.00 342 342 3 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 342 342 3 1
Row Source Operation
---------------------------------------------------
SORT AGGREGATE (cr=342 pr=342 pw=0 time=3788 us)
COUNT STOPKEY (cr=342 pr=342 pw=0 time=1371 us)
TABLE ACCESS FULL TEST_TAB (cr=342 pr=342 pw=0 time=925 us
cost=99 size=1169334 card=334)
Event waited on Times Max. Wait
Total Waited
---------------------------------------- Waited------------------ ---------- ------------
db file sequential read
2 0.00 0.00
db file scattered read 23 0.00 0.00
-- Raw Trace File --
'db file sequential read' ela= 19 file#=917 block#=130
blocks=1 obj#=2260476 tim=647299975335
'db file sequential read' ela= 14 file#=3 block#=768 blocks=1
obj#=0 tim=647299975399
-- UNDO file#=3
/oratestdb/oradata/testdb/undo01.dbf
'db file scattered read' ela= 22 file#=917 block#=131
blocks=5 obj#=2260476 tim=647299975501
'db file scattered read' ela= 25 file#=917 block#=136
blocks=8 obj#=2260476 tim=647299975609
'db file scattered read' ela= 25 file#=917 block#=145
blocks=7 obj#=2260476 tim=647299975713
'db file scattered read' ela= 23 file#=917 block#=152
blocks=8 obj#=2260476 tim=647299975806
'db file scattered read' ela= 25 file#=917 block#=161
blocks=7 obj#=2260476 tim=647299975901
'db file scattered read' ela= 24 file#=917 block#=168
blocks=8 obj#=2260476 tim=647299975994
'db file scattered read' ela= 23 file#=917 block#=177
blocks=7 obj#=2260476 tim=647299976088
'db file scattered read' ela= 23 file#=917 block#=184
blocks=8 obj#=2260476 tim=647299976178
'db file scattered read' ela= 23 file#=917 block#=193
blocks=7 obj#=2260476 tim=647299976270
'db file scattered read' ela= 22 file#=917 block#=200
blocks=8 obj#=2260476 tim=647299976364
'db file scattered read' ela= 23 file#=917 block#=209
blocks=7 obj#=2260476 tim=647299976465
'db file scattered read' ela= 22 file#=917 block#=216
blocks=8 obj#=2260476 tim=647299976554
'db file scattered read' ela= 22 file#=917 block#=225
blocks=7 obj#=2260476 tim=647299976646
'db file scattered read' ela= 29 file#=917 block#=232
blocks=8 obj#=2260476 tim=647299976759
'db file scattered read' ela= 24 file#=917 block#=241
blocks=7 obj#=2260476 tim=647299976866
'db file scattered read' ela= 23 file#=917 block#=248
blocks=8 obj#=2260476 tim=647299976956
'db file scattered read' ela= 128 file#=917 block#=258
blocks=32 obj#=2260476 tim=647299977200
'db file scattered read' ela= 95 file#=917 block#=290
blocks=32 obj#=2260476 tim=647299977511
'db file scattered read' ela= 97 file#=917 block#=322
blocks=32 obj#=2260476 tim=647299977822
'db file scattered read' ela= 87 file#=917 block#=354
blocks=30 obj#=2260476 tim=647299978113
'db file scattered read' ela= 96 file#=917 block#=386
blocks=32 obj#=2260476 tim=647299978407
'db file scattered read' ela= 108 file#=917 block#=418
blocks=32 obj#=2260476 tim=647299978719
'db file scattered read' ela= 94 file#=917 block#=450
blocks=32 obj#=2260476 tim=647299979021.
We made 2 db
file sequential read with blocks=1, and 23 db file scattered read with blocks
varied from 5 to 32, but summing up them together,
it is 342
disk reads. In this case, the maximum blocks=32 is probably dictated by db file
multiblock read count.
Note that
the second db file sequential read is to read undo data block (file#=3
block#=768 obj#=0).
It is
visible in Raw Trace, and counted in xplan statistics.
Look again
Dtrace output:
PROBEFUNC FD RETURN_SIZE COUNT
Pread 260 8192 1
pread 260 245760 1
readv 260
40960 1
pread 260
262144 6
readv 260
57344
7
lseek 260
0
8
readv 260
65536 8
PROBEFUNC FD MAX_READ_Blocks
readv 260 8
pread 260
32
TOTAL_SIZE = 2793472 , TOTAL_READ_Blocks = 341 ,
TOTAL_READ_CNT = 24
readv 260
value ----------------------------- Distribution --------------------------
count
8192 | 0
16384 |@@@@@@@@@@@@@@@@@@ 16
32768 | 0
pread 260
value ----------------------------- Distribution -----------------------------
count
16384 | 0
32768 |@@@@@@@@@@@@@@@@@@@@ 4
65536 |@@@@@@@@@@@@@@@ 3
131072 |@@@@@ 1
262144 |
0
Crosscheck Sql Raw Trace with Dtrace, we can see:
1 blocks=1 db file sequential read is implemented by 1 pread with
RETURN SIZE=8192.
1 blocks=5 db file scattered read is implemented by 1 readv with
RETURN SIZE=40960.
7 blocks=7 db file scattered read is implemented by 7 readv with
RETURN SIZE=57344.
8 blocks=8 db file scattered read is implemented by 8 readv with
RETURN SIZE=65536.
1 blocks=30 db file scattered read is implemented by 1 pread with
RETURN SIZE=245760.
6 blocks=32 db file scattered read is implemented by 6 pread with
RETURN SIZE=262144.
In total, we read 341 DB blocks by 24 OS read calls.
Sql Trace showed 25 (=23+2) reads to get 342 blocks, Dtrace
showed 24 reads to get 341 blocks, one extra
read in Sql Trace is due to one undo read (file#=3). (Note:
in Dtrace script, we only trace FD=260,
that is file#=917, undo file#=3 is not traced)
If we display segment extent allocations by two queries
below:
SQL > select segment_type, segment_subtype, header_block,
blocks, extents, initial_extent, next_extent from dba_segments v where
segment_name = 'TEST_TAB';
SEGMENT_TYPE SEGMENT_SU
HEADER_BLOCK
BLOCKS EXTENTS INITIAL_EXTENT
NEXT_EXTENT
-------------
---------- ------------ ------ ------- -------------- -----------
TABLE ASSM 130 10240
81 65536
1048576
SQL >
select blocks, count(*) cnt, min(extent_id), min(block_id) from dba_extents
where segment_name = 'TEST_TAB' group by blocks order by min(extent_id);
BLOCKS CNT MIN(EXTENT_ID)
MIN(BLOCK_ID)
------ --- -------------- -------------
8 16 0 128
128 63 16 256
1024 2 79 8320
and then
crosscheck with above raw trace file again. We can see that the first db file
sequential read (block#=130) by pread is to read segment header block (HEADER
BLOCK: 130), the next 16 db file scattered
read with
blocks between 5 and 8 by readv is to read all 16 initial extents (8 blocks per
extent), the rest 7 db file scattered read with blocks between 30 and 32 by pread
is to read incremental extents (128 blocks per extent).
The size of
incremental extents is 128 block, but each scattered read can read maximum 32 blocks
(db file multiblock read count=32).
Alireza
Kamrani
No comments:
Post a Comment