About Approximate Query Processing in Oracle 23ai
Approximate query processing is a set of
optimization techniques that speed analytic queries by calculating results
within an acceptable range of error, this feature was available from 12c.
History:
Oracle 12cR1 (12.1.0.2) → Introduced APPROX_COUNT_DISTINCT
aggregate function for faster distinct count estimations.
Oracle 12cR2 (12.2) → Expanded with additional
functions like APPROX_COUNT, APPROX_SUM, APPROX_RANK, etc.
Oracle 19c onward → Oracle added Approximate Query
Processing with Materialized Views and improved accuracy and performance.
Business intelligence (BI) queries heavily rely on sorts
that involve aggregate functions such as COUNT DISTINCT, SUM, RANK,
and MEDIAN.
For example, an application generates reports showing how
many distinct customers are logged on, or which products were most popular last
week. It is not uncommon for BI applications to have the following
requirements:
- Queries
must be able to process data sets that are orders of magnitude larger than
in traditional data warehouses.
For example, the daily volumes of web logs of a popular
website can reach tens or hundreds of terabytes a day.
- Queries
must provide near real-time response.
For example, a company requires quick detection and response
to credit card fraud.
- Explorative
queries of large data sets must be fast.
For example, a user might want to find out a list of
departments whose sales have approximately reached a specific threshold. A user
would form targeted queries on these departments to find more detailed
information, such as the exact sales number, the locations of these
departments, and so on.
For large data sets, exact aggregation queries consume
extensive memory, often spilling to temp space, and can be unacceptably slow.
Applications are often more interested in a general pattern
than exact results, so customers are willing to sacrifice
exactitude for speed.
For example, if the goal is to show a bar chart depicting
the most popular products, then whether a product sold 1 million units or .999
million units is statistically insignificant.
Oracle Database implements its solution through approximate
query processing. Typically, the accuracy of the approximate aggregation is
over 97% (with 95% confidence), but the processing time is orders of magnitude
faster. The database uses less CPU, and avoids the I/O cost of writing to temp
files.
Approximate
Query Initialization Parameters
You can implement approximate query processing without
changing existing code by using the APPROX_FOR_* initialization
parameters.
Set these parameters at the database or session level. The
following table describes initialization parameters and SQL functions relevant
to approximation techniques.
Table: Approximate
Query Initialization Parameters
Initialization Parameter |
Default |
Description |
APPROX_FOR_AGGREGATION |
FALSE |
Enables (TRUE) or
disables (FALSE) approximate query processing. This parameter acts as an
umbrella parameter for enabling the use of functions that return approximate
results. |
APPROX_FOR_COUNT_DISTINCT |
FALSE |
Converts COUNT(DISTINCT) to APPROX_COUNT_DISTINCT. |
APPROX_FOR_PERCENTILE |
none |
Converts eligible
exact percentile functions to their APPROX_PERCENTILE_* counterparts. |
Approximate query processing uses SQL functions to provide
real-time responses to explorative queries where approximations are acceptable.
The following table describes SQL functions that return
approximate results.
Table: Approximate
Query User Interface
SQL Function |
Description |
APPROX_COUNT |
Calculates the
approximate top n most common values when used with
the APPROX_RANK function. Returns the
approximate count of an expression. If you supply MAX_ERROR as the
second argument, then the function returns the maximum error between the
actual and approximate count. You must use this
function with a corresponding APPROX_RANK function in the HAVING clause.
If a query uses APPROX_COUNT, APPROX_SUM, or APPROX_RANK, then
the query must not use any other non-approximate aggregation functions. The following query
returns the 10 most common jobs within every department: SELECT department_id,
job_id, APPROX_COUNT(*) FROM employees GROUP BY
department_id, job_id HAVING APPROX_RANK ( PARTITION BY department_id ORDER BY APPROX_COUNT(*) DESC ) <= 10; |
APPROX_COUNT_DISTINCT |
Returns the
approximate number of rows that contain distinct values of an expression. |
APPROX_COUNT_DISTINCT_AGG |
Aggregates the
precomputed approximate count distinct synopses to a higher level. |
APPROX_COUNT_DISTINCT_DETAIL |
Returns the synopses
of the APPROX_COUNT_DISTINCT function as a BLOB. The database can
persist the returned result to disk for further aggregation. |
APPROX_MEDIAN |
Accepts a numeric or
date-time value, and returns an approximate middle or approximate
interpolated value that would be the middle value when the values are sorted. This function
provides an alternative to the MEDIAN function. |
APPROX_PERCENTILE |
Accepts a percentile
value and a sort specification, and returns an approximate interpolated value
that falls into that percentile value with respect to the sort specification. This function
provides an alternative to the PERCENTILE_CONT function. |
APPROX_RANK |
Returns the
approximate value in a group of values. This function takes
an optional PARTITION BY clause followed by a mandatory ORDER
BY ... DESC clause. The PARTITION BY key must be a subset of
the GROUP BY key. The ORDER BY clause must include
either APPROX_COUNT or APPROX_SUM. |
APPROX_SUM |
Calculates the
approximate top n accumulated values when used with
the APPROX_RANK function. If you supply MAX_ERROR as
the second argument, then the function returns the maximum error between the
actual and approximate sum. You must use this
function with a corresponding APPROX_RANK function in the HAVING clause.
If a query uses APPROX_COUNT, APPROX_SUM, or APPROX_RANK, then
the query must not use any other non-approximate aggregation functions. The following query
returns the 10 job types within every department that have the highest
aggregate salary: SELECT department_id,
job_id, APPROX_SUM(salary) FROM employees GROUP BY
department_id, job_id HAVING APPROX_RANK ( PARTITION BY department_id ORDER BY APPROX_SUM(salary) DESC ) <= 10; Note that APPROX_SUM returns
an error when the input is a negative number. |
No comments:
Post a Comment