Database-Box: About Approximate Query Processing in Oracle 23ai

About Approximate Query Processing in Oracle 23ai

Approximate query processing is a set of optimization techniques that speed analytic queries by calculating results within an acceptable range of error, this feature was available from 12c.

History:

Oracle 12cR1 (12.1.0.2) → Introduced APPROX_COUNT_DISTINCT aggregate function for faster distinct count estimations.

Oracle 12cR2 (12.2) → Expanded with additional functions like APPROX_COUNT, APPROX_SUM, APPROX_RANK, etc.

Oracle 19c onward → Oracle added Approximate Query Processing with Materialized Views and improved accuracy and performance.

Approximate Query Processing in Action:

Business intelligence (BI) queries heavily rely on sorts that involve aggregate functions such as COUNT DISTINCT, SUM, RANK, and MEDIAN.

For example, an application generates reports showing how many distinct customers are logged on, or which products were most popular last week. It is not uncommon for BI applications to have the following requirements:

Queries must be able to process data sets that are orders of magnitude larger than in traditional data warehouses.

For example, the daily volumes of web logs of a popular website can reach tens or hundreds of terabytes a day.

Queries must provide near real-time response.

For example, a company requires quick detection and response to credit card fraud.

Explorative queries of large data sets must be fast.

For example, a user might want to find out a list of departments whose sales have approximately reached a specific threshold. A user would form targeted queries on these departments to find more detailed information, such as the exact sales number, the locations of these departments, and so on.

For large data sets, exact aggregation queries consume extensive memory, often spilling to temp space, and can be unacceptably slow. Applications are often more interested in a general pattern than exact results, so customers are willing to sacrifice exactitude for speed.

For example, if the goal is to show a bar chart depicting the most popular products, then whether a product sold 1 million units or .999 million units is statistically insignificant.

Oracle Database implements its solution through approximate query processing. Typically, the accuracy of the approximate aggregation is over 97% (with 95% confidence), but the processing time is orders of magnitude faster. The database uses less CPU, and avoids the I/O cost of writing to temp files.

Approximate Query Initialization Parameters

You can implement approximate query processing without changing existing code by using the APPROX_FOR_* initialization parameters.

Set these parameters at the database or session level. The following table describes initialization parameters and SQL functions relevant to approximation techniques.

Table: Approximate Query Initialization Parameters

Initialization Parameter	Default	Description
APPROX_FOR_AGGREGATION	FALSE	Enables (TRUE) or disables (FALSE) approximate query processing. This parameter acts as an umbrella parameter for enabling the use of functions that return approximate results.
APPROX_FOR_COUNT_DISTINCT	FALSE	Converts COUNT(DISTINCT) to APPROX_COUNT_DISTINCT.
APPROX_FOR_PERCENTILE	none	Converts eligible exact percentile functions to their APPROX_PERCENTILE_* counterparts.

Approximate Query SQL Functions

Approximate query processing uses SQL functions to provide real-time responses to explorative queries where approximations are acceptable.

The following table describes SQL functions that return approximate results.

Table: Approximate Query User Interface

SQL Function	Description
APPROX_COUNT	Calculates the approximate top n most common values when used with the APPROX_RANK function. Returns the approximate count of an expression. If you supply MAX_ERROR as the second argument, then the function returns the maximum error between the actual and approximate count. You must use this function with a corresponding APPROX_RANK function in the HAVING clause. If a query uses APPROX_COUNT, APPROX_SUM, or APPROX_RANK, then the query must not use any other non-approximate aggregation functions. The following query returns the 10 most common jobs within every department: SELECT department_id, job_id, APPROX_COUNT() FROM employees GROUP BY department_id, job_id HAVING APPROX_RANK ( PARTITION BY department_id ORDER BY APPROX_COUNT() DESC ) <= 10;
APPROX_COUNT_DISTINCT	Returns the approximate number of rows that contain distinct values of an expression.
APPROX_COUNT_DISTINCT_AGG	Aggregates the precomputed approximate count distinct synopses to a higher level.
APPROX_COUNT_DISTINCT_DETAIL	Returns the synopses of the APPROX_COUNT_DISTINCT function as a BLOB. The database can persist the returned result to disk for further aggregation.
APPROX_MEDIAN	Accepts a numeric or date-time value, and returns an approximate middle or approximate interpolated value that would be the middle value when the values are sorted. This function provides an alternative to the MEDIAN function.
APPROX_PERCENTILE	Accepts a percentile value and a sort specification, and returns an approximate interpolated value that falls into that percentile value with respect to the sort specification. This function provides an alternative to the PERCENTILE_CONT function.
APPROX_RANK	Returns the approximate value in a group of values. This function takes an optional PARTITION BY clause followed by a mandatory ORDER BY ... DESC clause. The PARTITION BY key must be a subset of the GROUP BY key. The ORDER BY clause must include either APPROX_COUNT or APPROX_SUM.
APPROX_SUM	Calculates the approximate top n accumulated values when used with the APPROX_RANK function. If you supply MAX_ERROR as the second argument, then the function returns the maximum error between the actual and approximate sum. You must use this function with a corresponding APPROX_RANK function in the HAVING clause. If a query uses APPROX_COUNT, APPROX_SUM, or APPROX_RANK, then the query must not use any other non-approximate aggregation functions. The following query returns the 10 job types within every department that have the highest aggregate salary: SELECT department_id, job_id, APPROX_SUM(salary) FROM employees GROUP BY department_id, job_id HAVING APPROX_RANK ( PARTITION BY department_id ORDER BY APPROX_SUM(salary) DESC ) <= 10; Note that APPROX_SUM returns an error when the input is a negative number.

Database-Box

Tuesday, August 26, 2025

About Approximate Query Processing in Oracle 23ai

No comments:

Post a Comment

Understanding Oracle's Resource Cost and Session Limiting with COMPOSITE_LIMIT

Report Abuse

Labels