Configure Online Redo Logs Appropriately
In
the previous post, I represented best practice to design and configure Optimal
Redo logs architecture. In this topic you can find optimal size for Redo logs.
Redo log switching has a
significant impact on redo transport and applies performance. Follow these best
practices for sizing the online redo logs on the primary and standby databases.
Following these guidelines for
online redo logs.
- All online redo log groups should have
identically sized logs (to the byte).
- Online redo logs should reside on high
performing disks (Isolated DATA disk groups).
- Create a minimum of three online redo log
groups per thread of redo on Oracle RAC instances.
- Create online redo log groups on shared
disks in an Oracle RAC environment.
- Multiplex online redo logs (multiple
members per log group) unless they are placed on high redundancy disk
groups.
- Size online redo logs to switch no more
than 12 times per hour (every ~5 minutes). When log switches occur
at a higher frequency, such as less than a minute, database hangs may
occur if database checkpointing or archiving rates are not keeping up with
peak redo generation rate. An online redo log cannot be reused unless
changes have been checkpointed, and the online redo log has been archived
if archivelog mode is enabled.
If redo logs are too small, then
frequent log switches occur, creating resource-intensive Oracle Database
check-pointing in which all of the dirty buffers are flushed.
A range of 10 to 15 minutes or
longer for log switching is preferable under a normal OLTP (Online Transaction
Processing) load. However, during periods of heavy DML (data manipulation
language) activity, such as during large Siebel EIM loads or upgrades, the logs
might switch more frequently than every two minutes. When this occurs, overall
database performance suffers as a result.
You can check the frequency of
this operation either in the alert log or by querying v$loghist. It is best to
use verification when there are the greatest activity and the heaviest load on
the database.
If this activity occurs too
frequently, then drop and recreate individual redo log groups with larger
sizes. A suggested minimum size is 300 MB.
The size of the redo log files can
influence performance, because the behaviors of the database writer and
archiver processes depend on the redo log sizes. Generally, larger redo log
files provide better performance. Undersized log files increase checkpoint
activity and reduce performance. However, small log files and frequent
checkpoints reduce recovery time. So, if daily operational efficiency is more
important than minimizing recovery time, then set online redo log files to a
relatively large value. Hundreds of MB is a normal size for Oracle
Hyperion Financial Management databases. But the preferred way to
determine the size of redo log files is to enable FAST_START_MTTR_TARGET, and
run a typical database workload for a while. Then run the following query to
obtain the optimal size of redo logs.
SQL> select
optimal_logfile_size from v$instance_recovery;
For more details about how to tune
MTTR target and the size of online redo files.
This column shows the redo log
file size in MB.The current setting of FAST_START_MTTR_TARGET helps to consider
the size of REDO LOG file. If OPTIMAL_LOGFILE_SIZE column value is greater than
the current log file, we can cosider this value as your log file size.
To set optimal value for FAST_START_MTTR_TARGET, we can run MTTR Advisor with
our suggested log file size.
The easier way to find optimal
value for FAST_START_MTTR_TARGET, we can set FAST_START_MTTR_TARGET to 1 and
query TARGET_MTTR value in V$INSTANCE_RECOVERY.
This can be considered as Lower value. Do the same by setting
FAST_START_MTTR_TARGET to 3600 to find the Upper value. We can choose the value
within this range. But we may have to test multiple values to fix the
optimal value.
FAST_START_MTTR_TARGET parameter and MTTR Advisory
Setting FAST_START_MTTR_TARGET parameter to a non-zero value and
STATISTICS_LEVEL to TYPICAL or ALL enables the MTTR Advisory.
The V$MTTR_TARGET_ADVICE view to check the statistics or advisories collected
by MTTR Advisor.
Size the redo logs based on the peak
redo generation rate of the primary database.
You can determine the peak rate by
running the query below for a period of time that includes the peak workload.
The peak rate could be seen at
month-end, quarter-end, or annually.
Size the redo logs to handle the
highest rate in order for redo apply to perform consistently during these
workloads.
SQL>
SELECT thread#,sequence#,blocks*block_size/1024/1024
MB,(next_time-first_time)*86400 sec, blocks*block_size/1024/1024)/((next_time-first_time)*86400)
"MB/s"
FROM v$archived_log WHERE
((next_time-first_time)*86400<>0) and first_time
between to_date('2015/01/15
08:00:00','YYYY/MM/DD HH24:MI:SS')
and to_date('2015/01/15 11:00:00','YYYY/MM/DD
HH24:MI:SS') and dest_id=1 order by first_time;
THREAD# SEQUENCE# MB SEC
MB/s
-------------- --------------- --------------
---------- ------------------
2
2291 29366.1963 831 35.338383
1
2565 29365.6553 781 37.6000708
2
2292 29359.3403 537 54.672887
1
2566 29407.8296 813 36.1719921
2
2293 29389.7012 678 43.3476418
2
2294 29325.2217 1236 23.7259075
1
2567 11407.3379 2658 4.29169973
2
2295 29452.4648 477 61.7452093
2
2296 29359.4458 954 30.7751004
2
2297 29311.3638 586 50.0193921
1
2568 3867.44092 5510 0.701894903
Choose the redo log size based on
the peak generation rate with the following chart. Repeat this query if
workload changes or additional pluggable database is added.
|
Peak Redo
Rate |
Recommended
Redo Log Size |
|
<= 5 MB/sec |
500 MB |
|
<= 25 MB/sec |
1 GB |
|
<= 75 MB/sec |
4 GB |
|
<= 150 MB/sec |
8 GB |
|
<= 300 MB/sec |
16 GB |
|
> 300 MB/sec |
32 GB |
You can measure it using this simple diagnostic query over a representative
period (e.g., peak hours, month-end close):
SQL> SELECT
thread#,sequence#,blocks*block_size/1024/1024 MB,(next_time-first_time)*86400
sec,
blocks*block_size/1024/1024)/((next_time-first_time)*86400)
"MB/s"
FROM
v$archived_log WHERE ((next_time-first_time)*86400<>0) and first_time
between
to_date('2015/01/15 08:00:00','YYYY/MM/DD HH24:MI:SS')
and
to_date('2015/01/15 11:00:00','YYYY/MM/DD HH24:MI:SS') and dest_id=1 order by
first_time;
THREAD# SEE# MB SEC
MB/s
-------------- ---------- ---------------- --------- --------------------
2 2291 29366.1963 831
35.338383
1 2565 29365.6553
781 37.6000708
2 2292 29359.3403 537
54.672887
1 2566 29407.8296
813 36.1719921
2 2293 29389.7012 678
43.3476418
2 2294 29325.2217 1236 23.7259075
1 2567 11407.3379 2658 4.29169973
2 2295
29452.4648 477 61.7452093 <<<***
2 2296 29359.4458
954 30.7751004
2 2297 29311.3638
586 50.0193921
1 2568
3867.44092
5510 0.701894903
Choose the redo log size based on the peak generation rate with the
following chart.
Repeat this query if workload changes or additional pluggable database is
added.
So, find the peak MB/s value that is 61.75; that’s your
highest redo generation rate during typical workload spikes. We can set redo
logs size to 4 GB.
Review this sizing whenever
workloads shift; especially if new pluggable databases are added or business
cycles intensify.
Pro Tip: Size for the worst
hour of the busiest day; not the average. That’s how you keep
redo apply consistent and avoid apply lag on standbys.
The “10–15 Minute Log Switch” Rule
Many DBAs learned to size redo logs so that each online log group lasts
roughly 10–15 minutes before switching.
The idea was to:
- Avoid too
frequent switches (which cause checkpoint overhead).
- But still switch often
enough to have manageable recovery times after crash or instance
failure.
- If a log switched
every 3–5 minutes, that was “too fast” → increase log size.
- If a log stayed
active for 30–40+ minutes, that was considered too long →
decrease log size (“shrink them”).
Why?
Because redo logs that are too large meant:
1.
Instance recovery takes longer
Oracle must roll forward and roll back more redo after crash.
2.
You don’t benefit from checkpointing enough
Log switches trigger checkpoints; too few switches = dirty blocks stay in
buffer cache too long.
3.
Archiving becomes infrequent
Archiver isn’t exercised consistently → some RAC/HA/backup processes prefer
steady log turnover.
So, the idea was:
If redo logs are so large that they only switch every 30–40 minutes, make
them smaller so switches happen more frequently (every 10–15 minutes).
However, …
That heuristic assumes a steady redo generation rate; which is no
longer realistic.
Why “Log Switch Duration” Isn’t Always Accurate Today
1.
Redo rate fluctuates heavily
In modern databases, workload spikes (e.g., batch jobs, peak business
hours, Data Guard syncs) can increase redo rate 10×.
So, 15 minutes at 2 MB/sec could drop to 3 minutes at 10 MB/sec.
2.
Data Guard and Standby Apply
For physical standby and MAA (Maximum Availability Architecture), what
matters is redo volume/throughput rate, not just time.
Under-specified logs cause latency in redo applies and RTO spikes.
3.
Log switch does not equal redo generation
performance
Switching triggers a checkpoint and archive operation, but redo transport
and I/O throughput depend on log size and rate, not time.
✅ Modern Best Practice (per Oracle and MAA guidance)
The most accurate method is:
Size redo logs based on your peak redo generation rate (MB/sec); not duration.
You can derive this rate from archived logs (as in that query we
discussed):
Then choose a redo log size that keeps log switches between 15–20
minutes during peak load; not low load.
For example:
|
Peak Redo Rate (MB/s) |
Redo Log Size |
Log Switch @ Peak (minutes) |
|
25 MB/s |
1 GB |
≈ 11 min |
|
75 MB/s |
4 GB |
≈ 17 min |
|
150 MB/s |
8 GB |
≈ 18 min |
So, the concept of 10–15 minutes still fits.
Factors Affecting
Redo Log Size
The first study identified six
workload characteristics that directly influence redo generation.
|
Number |
Factor |
|
1 |
the number of indexes |
|
2 |
the number of threads |
|
3 |
the type of SQL statement |
|
4 |
number of columns |
|
5 |
commit frequency |
|
6 |
cache size |
Why Does Oracle
Generate So Much Redo? It Isn't Just About Transactions.
One of the most common performance
discussions I have with Oracle teams starts with a familiar complaint:
"Our archive logs are growing
too fast."
The first reaction is usually to
increase redo log size, add archive destinations, or buy faster storage.
Those actions may reduce the
symptoms—but they rarely address the actual cause.
After reviewing several Oracle
performance studies and comparing them with real production environments, one
thing becomes clear:
Redo generation is
largely determined by application design and workload characteristics, not by
the redo logs themselves.
More indexes = More
redo
Every INSERT, UPDATE, or DELETE
doesn't only modify the table.
Oracle must also maintain every
affected index.
One experiment showed:
|
Indexes |
Redo Generated |
|
No indexes |
30 MB |
|
One index |
38 MB |
|
Seven indexes |
202 MB |
That jump isn't surprising when
you remember that every index change is protected by redo.
I've seen systems where tables
accumulated dozens of historical indexes that nobody was using anymore. Query
performance barely changed after cleanup, but redo generation dropped
noticeably.
The lesson is simple:
Every index has a
write cost.
Your SQL pattern
matters
Developers often choose between:
- UPDATE
- MERGE
- DELETE + INSERT
because they produce the same
business result.
Oracle doesn't see them as
equivalent.
The research showed that updating
existing rows with MERGE generated significantly less redo than
deleting and reinserting the same data.
From the database perspective,
preserving rows is usually much cheaper than recreating them.
Updating columns,
you don't need
ORM frameworks frequently update
every column in a row, even when only one value has changed.
That convenience has a price.
Updating all columns produced
nearly three times more redo than updating only the required columns.
If your application generates
massive redo, reviewing SQL generated by the application framework is often
more valuable than changing database parameters.
Commit frequency
isn't just about transactions
One interesting observation was
commit frequency.
Committing every row produced
substantially more redo than batching commits.
The lowest redo generation
occurred when commits were grouped instead of executed row by row.
Of course, commit frequency should
always follow business requirements; not performance alone; but unnecessary
micro-transactions can become surprisingly expensive.
Memory still matters
Increasing cache size consistently
reduced redo generation during testing.
A larger cache allows Oracle to
process changes more efficiently before writing them to disk.
This doesn't eliminate redo; it
simply reduces unnecessary overhead around generating and flushing it.
Hardware still plays
a role
Even after optimizing SQL and
application behavior, redo must eventually be written to storage.
That's why redo logs deserve the
fastest storage available.
The study also reinforces
something most experienced DBAs already know:
- RAID 1 or RAID 10 remains the preferred
choice for redo logs.
- RAID 5 introduces additional write
overhead because of parity calculations.
- Faster storage; especially SSD or NVMe or
adding Disk Spindles in raid level for HDD; provides much greater benefits
than simply adding more memory.
My biggest takeaway
When people talk about reducing
redo, they usually think about changing Oracle parameters.
In reality, the biggest
improvements often come from outside the database:
- Removing unnecessary indexes
- Writing better SQL
- Updating only changed columns
- Choosing efficient transaction patterns
- Reviewing commit strategy
- Using appropriate storage
Oracle is exceptionally efficient
at writing redo.
More often than not, it's our
applications that ask Oracle to generate more redo than necessary.
============Alireza Kamrani=========