Approximate Query Processing in Oracle Database (12.2)

The growing market for Big Data has created a lot of interest around approximate query processing (AQP) as a means of achieving interactive response times. In my previous post I provided an explanation of one of the more well-known approximate functions – APPROX_COUNT_DISTINCT. Here was an example of how approximate query processing obtains approximate results with negligible deviation from the exact result while dramatically saving processing resources. In this post I will describe how to use some of the other new features related to approximate query processing.

About Approximate Query Processing

Approximate query processing uses SQL functions to provide real-time responses to
explorative queries where approximations are acceptable. A query containing SQL
functions that return approximate results is referred to as an approximate query. Business  intelligence (BI) applications extensively use aggregate functions, including
analytic functions, to provide answers to common business queries. For some types of
queries, when the data set is extremely large, providing exact answers can be resource
intensive.
images (1)
For example, counting the number of unique customer sessions on a website
or establishing the median house price within each post code across a state. In certain
scenarios, these types of queries may not require exact answers because you are more
interested in approximate trends or patterns, which can then be used to drive further
analysis. Approximate query processing is primarily used in data discovery
applications to return quick answers to explorative queries. Users typically want to
locate interesting data points within large amounts of data and then drill down to
uncover further levels of detail. For explorative queries, quick responses are more
important than exact values.

Oracle provides a set of SQL functions that enable you to obtain approximate results with negligible deviation from the exact result. There are additional approximate functions that support materialized view based summary aggregation strategies. The functions that provide approximate results are as follows:

  • APPROX_COUNT_DISTINCT
  • APPROX_COUNT_DISTINCT_DETAIL
  • APPROX_COUNT_DISTINCT_AGG
  • TO_APPROX_COUNT_DISTINCT
  • APPROX_MEDIAN
  • APPROX_PERCENTILE
  • APPROX_PERCENTILE_DETAIL
  • APPROX_PERCENTILE_AGG
  • TO_APPROX_PERCENTILE
Approximate query processing can be used without any changes to your existing
code. When you set the appropriate initialization parameters, Oracle Database
replaces exact functions in queries with the corresponding SQL functions that return
approximate results.

Running Queries Containing Exact Functions Using SQL Functions that Return Approximate Values

The interesting part is that you can utilize the approximate functions without changing code. There are three initialization parameters introduced to control which functions should be treated as an approximate function during runtime:

The initialization parameters are:

  • approx_for_aggregation
  • approx_for_count_distinct
  • approx_for_percentile

To replace only the COUNT(DISTINCT) function in queries with the APPROX_COUNT_DISTINCT function, set the approx_for_count_distinct initialization parameter to TRUE for the current session or for the entire database.

To replace percentile functions with the corresponding functions that return approximate results, set approx_for_percentile to PERCENTILE_CONT, PERCENTILE_DISC, or ALL (replaces all percentile functions) for the current session or for the entire database.

To run all queries using the corresponding SQL functions that return approximate results instead of the specified SQL functions, set the approx_for_aggregation initialization parameter to TRUE for the current session or for the entire database.

When Should Approximate Query Processing (AQP) Be Used?

If you work in a billing department, stop reading right now 🙂

1. Use AQP when you care about interactive response times
Exploratory data analytics is typically an interactive and iterative process: you form an initial hypothesis (e.g., by visualizing and looking at the data), running some queries, modifying your queries based on the previous answers, and so on until you find a satisfactory explanation. You cannot be productive if you have to wait half an hour every time you run a new query to test a hypothesis.

2. Use AQP when you can make perfect decisions with imperfect answers
Exact results have no advantage over approximate ones if they both lead to the exact same conclusion/decision. This obviously depends on the quality of your approximation and the application logic.

3. Use AQP when your data is incomplete or noisy
Believe it not, most of the data that is collected and used in the real world is extremely noisy. So the idea that processing your entire data gives you 100% accurate answers is usually an illusion anyway. In other words, if my data is noisy to begin with, I might as well take advantage of approximation, which uses a controlled degree of noise in my computation in exchange for significant speed ups. This noise in your data can be because of many reasons – human error, missing values, white noise, data extraction errors, data conversion errors.

4. Use AQP when your goal is to predict something
In general, there are many cases where the output of database queries are consumed by predictive analytics or machine learning algorithms, and in those cases, returning smaller samples of the original data can be a great solution for gaining considerable performance benefits.

Additional Reading: Oracle Database New Features Guide 12c Release 2 (12.2)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s