Tuesday, August 26, 2025

About Approximate Query Processing in Oracle 23ai

About Approximate Query Processing in Oracle 23ai

Approximate query processing is a set of optimization techniques that speed analytic queries by calculating results within an acceptable range of error, this feature was available from 12c.

History:

Oracle 12cR1 (12.1.0.2) → Introduced APPROX_COUNT_DISTINCT aggregate function for faster distinct count estimations.

Oracle 12cR2 (12.2) → Expanded with additional functions like APPROX_COUNT, APPROX_SUM, APPROX_RANK, etc.

Oracle 19c onward → Oracle added Approximate Query Processing with Materialized Views and improved accuracy and performance.

 Approximate Query Processing in Action:

Business intelligence (BI) queries heavily rely on sorts that involve aggregate functions such as COUNT DISTINCT, SUM, RANK, and MEDIAN.

For example, an application generates reports showing how many distinct customers are logged on, or which products were most popular last week. It is not uncommon for BI applications to have the following requirements:

  • Queries must be able to process data sets that are orders of magnitude larger than in traditional data warehouses.

For example, the daily volumes of web logs of a popular website can reach tens or hundreds of terabytes a day.

  • Queries must provide near real-time response.

For example, a company requires quick detection and response to credit card fraud.

  • Explorative queries of large data sets must be fast.

For example, a user might want to find out a list of departments whose sales have approximately reached a specific threshold. A user would form targeted queries on these departments to find more detailed information, such as the exact sales number, the locations of these departments, and so on.

For large data sets, exact aggregation queries consume extensive memory, often spilling to temp space, and can be unacceptably slow. Applications are often more interested in a general pattern than exact results, so customers are willing to sacrifice exactitude for speed.

 

For example, if the goal is to show a bar chart depicting the most popular products, then whether a product sold 1 million units or .999 million units is statistically insignificant.

Oracle Database implements its solution through approximate query processing. Typically, the accuracy of the approximate aggregation is over 97% (with 95% confidence), but the processing time is orders of magnitude faster. The database uses less CPU, and avoids the I/O cost of writing to temp files.

 

Approximate Query Initialization Parameters

You can implement approximate query processing without changing existing code by using the APPROX_FOR_* initialization parameters.

Set these parameters at the database or session level. The following table describes initialization parameters and SQL functions relevant to approximation techniques.

 

Table: Approximate Query Initialization Parameters

Initialization Parameter

Default

Description

APPROX_FOR_AGGREGATION

FALSE

Enables (TRUE) or disables (FALSE) approximate query processing. This parameter acts as an umbrella parameter for enabling the use of functions that return approximate results.

APPROX_FOR_COUNT_DISTINCT

FALSE

Converts COUNT(DISTINCT) to APPROX_COUNT_DISTINCT.

APPROX_FOR_PERCENTILE

none

Converts eligible exact percentile functions to their APPROX_PERCENTILE_* counterparts.

  Approximate Query SQL Functions

Approximate query processing uses SQL functions to provide real-time responses to explorative queries where approximations are acceptable.

The following table describes SQL functions that return approximate results.

Table: Approximate Query User Interface

SQL Function

Description

APPROX_COUNT

Calculates the approximate top n most common values when used with the APPROX_RANK function.

Returns the approximate count of an expression. If you supply MAX_ERROR as the second argument, then the function returns the maximum error between the actual and approximate count.

You must use this function with a corresponding APPROX_RANK function in the HAVING clause. If a query uses APPROX_COUNT, APPROX_SUM, or APPROX_RANK, then the query must not use any other non-approximate aggregation functions.

The following query returns the 10 most common jobs within every department:

SELECT department_id, job_id,

       APPROX_COUNT(*)

FROM   employees

GROUP BY department_id, job_id

HAVING

  APPROX_RANK (

  PARTITION BY department_id

  ORDER BY APPROX_COUNT(*)

  DESC ) <= 10;

APPROX_COUNT_DISTINCT

Returns the approximate number of rows that contain distinct values of an expression.

APPROX_COUNT_DISTINCT_AGG

Aggregates the precomputed approximate count distinct synopses to a higher level.

APPROX_COUNT_DISTINCT_DETAIL

Returns the synopses of the APPROX_COUNT_DISTINCT function as a BLOB.

The database can persist the returned result to disk for further aggregation.

APPROX_MEDIAN

Accepts a numeric or date-time value, and returns an approximate middle or approximate interpolated value that would be the middle value when the values are sorted.

This function provides an alternative to the MEDIAN function.

APPROX_PERCENTILE

Accepts a percentile value and a sort specification, and returns an approximate interpolated value that falls into that percentile value with respect to the sort specification.

This function provides an alternative to the PERCENTILE_CONT function.

APPROX_RANK

Returns the approximate value in a group of values.

This function takes an optional PARTITION BY clause followed by a mandatory ORDER BY ... DESC clause. The PARTITION BY key must be a subset of the GROUP BY key. The ORDER BY clause must include either APPROX_COUNT or APPROX_SUM.

APPROX_SUM

Calculates the approximate top n accumulated values when used with the APPROX_RANK function.

If you supply MAX_ERROR as the second argument, then the function returns the maximum error between the actual and approximate sum.

You must use this function with a corresponding APPROX_RANK function in the HAVING clause. If a query uses APPROX_COUNT, APPROX_SUM, or APPROX_RANK, then the query must not use any other non-approximate aggregation functions.

The following query returns the 10 job types within every department that have the highest aggregate salary:

SELECT department_id, job_id,

       APPROX_SUM(salary)

FROM   employees

GROUP BY department_id, job_id

HAVING

  APPROX_RANK (

  PARTITION BY department_id

  ORDER BY APPROX_SUM(salary)

  DESC ) <= 10;

Note that APPROX_SUM returns an error when the input is a negative number.


No comments:

Post a Comment

Oracle CBO decisions for Join Order along with multiple tables

  Have you ever wondered how Oracle handles queries with multiple joins? How Oracle Find Best Orders of Tables in a large SQL statement? ...