Database-Box: About Quarantined SQL Plans

About Quarantined SQL Plans

You can configure Oracle Database to automatically quarantine the plans for SQL statements terminated by Oracle Database Resource Manager (the Resource Manager) for exceeding resource limits.

How SQL Quarantine Works

The Resource Manager can set a maximum estimated execution time for a SQL statement, for example, 20 minutes. If a statement execution exceeds this limit, then the Resource Manager terminates the statement. However, the statement may run repeatedly before being terminated, wasting 20 minutes of resources each time it is executed.

The SQL Quarantine infrastructure (SQL Quarantine) solves the problem of repeatedly wasting resources. If a statement exceeds the specified resource limit, then the Resource Manager terminates the execution and “quarantines” the plan. To quarantine the plan means to put it on a blocklist of plans that the database will not execute for this statement. Note that the plan for a terminated statement is quarantined, not the statement itself.

The query in our example runs for 20 minutes only once, and then never again—unless the resource limit increases or the plan changes. If the limit is increased to 25 minutes, then the Resource Manager permits the statement to run again with the quarantined plan. If the statement runs for 23 minutes, which is below the new threshold, then the Resource Manager removes the plan from quarantine. If the statement runs for 26 minutes, which is above the new threshold, then the plan remains quarantined unless the limit is increased.

SQL Quarantine User Interface

The DBMS_SQLQ PL/SQL package enables you to manually create quarantine configurations for execution plans by specifying thresholds for consuming system resources. For example, you can enable a quarantine threshold of 10 seconds for CPU time or drop the threshold for I/O requests. You can also immediately save the quarantine information to disk or drop configurations.

To enable SQL Quarantine to create configurations automatically after the Resource Manager terminates a query, set the OPTIMIZER_CAPTURE_SQL_QUARANTINE initialization parameter to true (the default is false). To disable the use of existing SQL Quarantine configurations, set OPTIMIZER_USE_SQL_QUARANTINE to false (the default is true).

The V$SQL.SQL_QUARANTINE column indicates whether a plan was quarantined for a statement after the Resource Manager canceled execution.

The AVOIDED_EXECUTIONS column indicates how often Oracle Database prevented the statement from running with the quarantined plan.

About the Expression Statistics Store (ESS)

The Expression Statistics Store (ESS) is a repository maintained by the optimizer to store statistics about expression evaluation.

When an IM column store is enabled, the database leverages the ESS for its In-Memory Expressions (IM expressions) feature. However, the ESS is independent of the IM column store. The ESS is a permanent component of the database and cannot be disabled.

The database uses the ESS to determine whether an expression is “hot” (frequently accessed), and thus a candidate for an IM expression. During a hard parse of a query, the ESS looks for active expressions in the SELECT list, WHERE clause, GROUP BY clause, and so on.

For each segment, the ESS maintains expression statistics such as the following:

Frequency of execution
Cost of evaluation
Timestamp evaluation

The optimizer assigns each expression a weighted score based on cost and the number of times it was evaluated. The values are approximate rather than exact. More active expressions have higher scores. The ESS maintains an internal list of the most frequently accessed expressions.

The ESS resides in the SGA and also persists on disk. The database saves the statistics to disk every 15 minutes, or immediately using the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure. The ESS statistics are visible in the DBA_EXPRESSION_STATISTICS view.

SQL Quarantine

SQL Quarantine in Oracle Database 23c (introduce in 19c above) is a feature that automatically prevents runaway or resource-intensive SQL statements from consuming excessive system resources by quarantining their execution plans. When a SQL statement exceeds predefined resource limits (e.g., CPU time, elapsed time) as set in Oracle Resource Manager, its execution plan can be automatically quarantined.

Key aspects of SQL Quarantine:

Automatic Creation:

If OPTIMIZER_CAPTURE_SQL_QUARANTINE is enabled, the database automatically creates a SQL Quarantine configuration for the execution plan of a SQL statement that is terminated by Resource Manager due to exceeding resource limits.

Preventing Future Executions:

Once an execution plan is quarantined, subsequent attempts to execute the same SQL statement using that specific quarantined plan will be prevented or terminated, depending on the configuration. This helps to maintain system stability and prevent performance degradation.

Configuration with DBMS_SQLQ: such as DBMS_SQLQ.CREATE_QUARANTINE_BY_SQL_ID

The DBMS_SQLQ package provides an interface for manually configuring and managing SQL Quarantine. This allows for fine-grained control over which SQL plans are quarantined and under what conditions. You can specify thresholds for resources like CPU time or elapsed time that, if exceeded, will trigger the quarantine.

Integration with Resource Manager:

SQL Quarantine works in conjunction with Oracle Resource Manager. Resource Manager defines the consumer groups and resource limits, while SQL Quarantine enforces these limits by preventing problematic SQL plans from running.

Monitoring:

The V$SQL view includes a QUARANTINED column, indicating if a SQL statement's plan has been quarantined. The AVOIDED_EXECUTION column tracks the number of times execution attempts were made after a plan was quarantined.

Scope:

The quarantine applies to a specific execution plan, not the SQL statement itself. If the SQL statement can be executed with a different, non-quarantined plan, it will be allowed to run.

Behavior with Increased Limits:

If the resource limits in the Resource Manager plan are increased, a previously quarantined SQL statement might be allowed to run again with its quarantined plan. If it then runs within the new limits, the quarantine can be removed.

Managing Transactions

Managing transactions include tasks such as setting transaction priority and automatically rolling back transactions.

Priority Transactions
The Oracle database allows transactions to be automatically rolled back and includes parameters to control this behavior.
Automatic Transaction Quarantine
Oracle Database quarantines, or isolates, the recovery of transactions that could potentially cause a system crash. These transactions must be manually resolved by the DBA so that row locks are released.

Priority Transactions

The Oracle database allows transactions to be automatically rolled back and includes parameters to control this behavior.

A row lock is a lock on a single row of a table. A transaction acquires a row lock for each row modified by one of the following statements: INSERT, UPDATE, DELETE, MERGE, and SELECT ... FOR UPDATE. The row lock exists until the transaction commits or rolls back. Transactions can hold row locks for a long duration in certain cases. For example, the application modifies some rows but doesn't commit or terminate the transaction because of an exception in the application. Traditionally, when a transaction is blocked on a rowlock by another transaction for a long time, it required the database administrator to manually terminate the blocking transaction by using the ALTER SYSTEM KILL SESSION command.

Starting with Oracle Database 26ai, the database provides parameters to control when and which transactions holding rowlocks can be automatically rolled back. Oracle database rolls back the transaction but the session stays alive. The application must acknowledge the automatic rollback of the transaction by issuing a ROLLBACK SQL statement.

Applications can specify the priority of their transactions. If a low priority transaction blocks a high priority transaction on rowlocks, Oracle database will automatically roll back the low priority transaction to let the high priority transaction(s) progress. The

database administrator can configure the time after which the low priority transaction is rolled back.

Automatic Transaction Quarantine

Oracle Database quarantines, or isolates, the recovery of transactions that could potentially cause a system crash. These transactions must be manually resolved by the DBA so that row locks are released.

About Redo Application

Database buffers in the buffer cache in the SGA are written to disk only when necessary, using a least-recently-used (LRU) algorithm. Because of the way that the database writer process uses this algorithm to write database buffers to datafiles, datafiles may contain some data blocks modified by uncommitted transactions and some data blocks missing changes from committed transactions.

Crash Recovery and Instance Recovery

Crash recovery is used to recover from a failure either when a single-instance database crashes or all instances of an Oracle Real Application Clusters database crashes. Instance recovery refers to the case where a surviving instance recovers a failed instance in an Oracle Real Application Clusters database.

The goal of crash and instance recovery is to restore the data block changes located in the cache of the dead instance and to close the redo thread that was left open. Instance and crash recovery use only online redo log files and current online datafiles.

Two potential problems can result if an instance failure occurs:

Data blocks modified by a transaction might not be written to the datafiles at commit time and may only appear in the redo log. Therefore, the redo log contains changes that must be reapplied to the database during recovery.
After the roll forward phase, the datafiles may contain changes that had not been committed at the time of the failure. These uncommitted changes must be rolled back to ensure transactional consistency. These changes were either saved to the datafiles before the failure or introduced during the roll forward phase.

To solve this dilemma, two separate steps are generally used by Oracle for a successful recovery of a system failure: rolling forward with the redo log (cache recovery) and rolling back with the rollback or undo segments (transaction recovery).

Cache Recovery

The online redo log is a set of operating system files that record all changes made to any database buffer, including data, index, and rollback segments, whether the changes are committed or uncommitted. All changes to Oracle blocks are recorded in the online log.

The first step of recovery from an instance or disk failure is called cache recovery or rolling forward and involves reapplying all of the changes recorded in the redo log to the datafiles. Because rollback data is also recorded in the redo log, rolling forward also regenerates the corresponding undo segments.

Rolling forward proceeds through as many redo log files as necessary to bring the database forward in time. Rolling forward usually includes online redo log files (instance recovery or media recovery) and may include archived redo log files (media recovery only).

After rolling forward, the data blocks contain all committed changes. They may also contain uncommitted changes that were either saved to the datafiles before the failure or were recorded in the redo log and introduced during cache recovery.

Transaction Recovery

Undo tablespaces (in automatic undo management mode) contain undo segments that record the before-image of changes to the database. In database recovery, the undo blocks inside the undo segments roll back the effects of uncommitted transactions previously applied by the rolling forward phase.

After the roll forward, any changes that were not committed must be undone. Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the crash or introduced by redo application during cache recovery. This process of rolling back uncommitted transactions in the database is called transaction recovery.

The following figure illustrates rolling forward and rolling back, the two steps necessary to recover from any type of system failure.

Failure During Transaction Recovery

Transaction recovery can fail due to the following reasons:

Physical data corruption of database blocks (ORA-01578, ORA-28304)
Logical data corruption (ORA-00600)
Memory corruption (ORA-00602, ORA-07445)
State Corruptions (ORA-00600)

A failure during transaction recovery can be irrecoverable to the entire database instance and bring down the entire container database (CDB) including its pluggable databases. Inability to recover all the transactions in the system leads to rowlocks being held by unrecovered transactions for longer. This severely impacts critical business operations.

Starting with Oracle Database 26ai, transactions that fail to recover are quarantined and left un-recovered until the DBA can resolve the issue. This increases the availability of the database. The Database Developer is notified about the quarantined transaction and must take immediate action so that the row locks held by quarantined transactions can be released.

Transaction quarantines are maintained in a persistent data dictionary table inside the database. Therefore, you can manage quarantines from any RAC instance in the database.

When a DML operation tries to access rows locked by a quarantined transaction error ORA-60451 will be raised as the DML operation cannot be executed while the rows are still locked.

Quarantined Transaction and Replication

Since Oracle Data Guard uses logical replication, quarantine metadata is not replicated to the standby server when using Oracle Data Guard. Therefore, contents of transaction quarantine views, such as DBA_QUARANTINED_TRANSACTIONS, on the standby server may be different than the entries on the primary server.

When running with Active Data Guard (ADG), the replication is physical which means that for the transaction quarantine feature, both the dead transaction and the catalog representation of the quarantine will be replicated to the standby database.

Monitoring Quarantined Transactions

Alerts and data dictionary views warn the database developer of quarantined transactions.

Oracle Database warns DBAs of quarantined transactions in several ways, which include:

ALERT_QUE - the transaction quarantine alert is sent to the persistent alert queue SYS.ALERT_QUE. This alert is automatically displayed in the data dictionary views DBA_OUTSTANDING_ALERTS and DBA_ALERT_HISTORY, as well as Enterprise Manager Cloud Control and the AWR report.
Attention log - introduced in Oracle 21c, the attention log contains information about critical and highly visible database events. Starting with Oracle Database 26ai, it includes the transaction quarantine information as well.
Alert log - an incident will be generated for the internal error and traced in the alert log. The DBA can monitor the quarantine incident in V$DIAG_ALERT_EXT.

Views named DBA_QUARANTINED_TRANSACTIONS and CDB_QUARANTINED_TRANSACTIONS monitor all active quarantined transactions. These views provide all the necessary information to resolve the quarantine.

The view DBA_QUARANTINED_TRANSACTIONS view can be joined with GV$TRANSACTION and GV$FAST_START_TRANSACTIONS to get the details of the transaction and its recovery progress. Note that GV$TRANSACTION will lose its information on a database instance restart because fixed views are not persistent. Since transaction recovery begins after a database instance restart, GV$TRANSACTION shows the progress of any active transaction recovery even after a database restart.

Resolving Quarantined Transactions

The database developer will be alerted when a transaction quarantine is generated. Quarantines should be monitored and resolved quickly to prevent row locks from being held for a long time.

Quarantines can be monitored using DBA_QUARANTINED_TRANSACTIONS. The REASON column of the view shows why the transaction was quarantined. For example:

SQL> select usn, slt, sqn, reason, undo_record_objn

from dba_quarantined_transactions;

USN SLT SQN REASON UNDO_RECORD_OBJN

------ ------ ------ ---------------------- -------------------

6 18 10 ORA-00600[ktubko_1] 73646

7 20 13 ORA-28304 73650

Once the reason for the transaction quarantines has been identified (ORA-00600[ktubko_1] and ORA-28304 in the example above), then refer to the Primary MOS note for Automatic Transaction Quarantine (Doc ID 3005962.1) where detailed instructions are provided for how to resolve the different causes of transaction quarantines.

Dropping Quarantined Transactions

After the issue related to the quarantine has been fixed, the quarantine must be dropped.

Transaction Recovery cannot be retried until the issue concerning the transaction quarantine is fixed. Therefore, after fixing the quarantine with the corrective action, the quarantine must be manually dropped for transaction recovery to restart for the quarantined transaction. The following DDL syntax can be used to drop the quarantine:

ALTER DATABASE DROP TRANSACTION QUARANTINE

<xid_undo_seg_no> <xid_slot_no> <xid_sequence_no>;

where

xid_undo_seg_no is the undo segment number of the quarantined transaction (USN column of view DBA_QUARANTINED_TRANSACTIONS)
xid_slot_no is the slot number of the quarantined transaction (SLT column of view DBA_QUARANTINED_TRANSACTIONS)
xid_sequence_no is the sequence number of the quarantined transaction (SQN column of view DBA_QUARANTINED_TRANSACTIONS)

For example, to drop the quarantine for xid 8.20.275, use command:

ALTER DATABASE DROP TRANSACTION QUARANTINE 8 20 275;

Transaction Quarantine Escalation

When the transaction quarantine limit is reached (default of 3) for a PDB, it is automatically shut down on all RAC instances so that the database developer can resolve the issue. The other PDBs in the CDB are not affected.

Transaction quarantine is designed to help in cases when the failure, such as memory, data, or state corruption, is confined to a single transaction. That is, the inactive transaction that fails to recover is quarantined, other inactive transactions can be recovered, and there's no need to shut down the PDB or the CDB.

When failures happen across multiple transactions or span the entire PDB, such as physical corruption of multiple blocks, a PDB SGA corruption, or a logical data corruption due to an internal error, quarantining the failed inactive transaction recovery may or may not help. It depends on whether the root cause for those failures is the same or not, because recovering other inactive transactions might run into the same issue. The system keeps on running in an inconsistent state even after quarantining a few transactions. It can be dangerous when the failure is due to logical data corruption, because it spreads over time. To prevent this from happening, there is a transaction quarantine limit of three (3), after which the quarantine is escalated to the database level and the PDB will be terminated using shutdown abort if archive

logging is enabled for the PDB and it is feasible to shut down the PDB. Transaction recovery for the PDB is automatically disabled so that the database developer can correct problems on the next PDB startup.

When an escalation occurs, perform the following steps:

Open the PDB.
Query the view DBA_QUARANTINED_TRANSACTIONS to get information about the quarantined transactions.
For each quarantined transaction in the database, resolve the cause of the transaction quarantine (Resolving Quarantined Transactions) and then drop the transaction quarantine (see Dropping Quarantined Transactions).
Enable transaction recovery for the PDB.

To enable transaction recovery, use the command:

ALTER SYSTEM SET TRANSACTION_RECOVERY=ENABLED sid='*';

The SCOPE clause is not necessarily required. The default values for SCOPE are:

For PDBs, the default value is SCOPE=BOTH.
For CDB$ROOT, if a server parameter file was used to start the database, then the default is SCOPE=BOTH. If a parameter file was used to start the database, then the default is SCOPE=MEMORY.

These default values for SCOPE will re-enable transaction recovery for automatic transaction quarantine.

To determine if transaction quarantines were escalated to the PDB, alerts are published to all the alert channels described in Monitoring Quarantined Transactions (SYS.ALERT_QUE, Attention log, and Alert log).

Alireza Kamrani

Database-Box

Sunday, November 9, 2025

About Quarantined SQL Plans

No comments:

Post a Comment

Fast Connection Node Affinity Awareness in Oracle RAC & RMAN

Report Abuse

Labels