Now working towards the OCP…
|Range of values||
DEFERRED_SEGMENT_CREATION specifies the semantics of deferred segment creation. The design goal here is to prevent hundreds or thousands of segments being created by a 3rd party application that only uses a few of the tables it creates. If set to
true, then segments for tables and their dependent objects (LOBs, indexes) will not be created until the first row is inserted into the table. Before creating a set of tables (or installing an application), if it is known that a significant number of them will not be populated, then consider setting this parameter to
true. This saves disk space and minimises install time. This feature was first introduced in 11g.
Deferred segment creation allows tables and indexes to be created without physical segments which can be a double-edged sword as it allows objects to be created even if there is no space available for those objects in the specified tablespace. The data dictionary contains the object information so a query of DBA_TABLES, DBA_INDEXES or any of the %TAB% and %IND% tables and views will return the expected information. Querying DBA_SEGMENTS, however, may fail to return data since an object may not have any segments associated with it.
- at the database level via your init/spfile
- at the session level via “alter session set deferred_segment_creation = false;”
- statement by statement via “create table t ( x int ) segment creation immediate;”
Should you use deferred segment creation? That choice is one both the DBA team and development team need to make. Look at all aspects of the argument then decide accordingly. Remember that the ‘right’ answer is the one that’s suited to your environment.
Here’s something to add to your security threat model: backups. Why steal live data and when you can drive away with exact replica?
— Kevin Beaumont (@GossiTheDog) February 15, 2017
About Approximate Query Processing
Oracle provides a set of SQL functions that enable you to obtain approximate results with negligible deviation from the exact result. There are additional approximate functions that support materialized view based summary aggregation strategies. The functions that provide approximate results are as follows:
Running Queries Containing Exact Functions Using SQL Functions that Return Approximate Values
The interesting part is that you can utilize the approximate functions without changing code. There are three initialization parameters introduced to control which functions should be treated as an approximate function during runtime:
The initialization parameters are:
To replace only the
COUNT(DISTINCT) function in queries with the
APPROX_COUNT_DISTINCT function, set the
approx_for_count_distinct initialization parameter to
TRUE for the current session or for the entire database.
To replace percentile functions with the corresponding functions that return approximate results, set
ALL (replaces all percentile functions) for the current session or for the entire database.
To run all queries using the corresponding SQL functions that return approximate results instead of the specified SQL functions, set the
approx_for_aggregation initialization parameter to
TRUE for the current session or for the entire database.
When Should Approximate Query Processing (AQP) Be Used?
If you work in a billing department, stop reading right now 🙂
1. Use AQP when you care about interactive response times
Exploratory data analytics is typically an interactive and iterative process: you form an initial hypothesis (e.g., by visualizing and looking at the data), running some queries, modifying your queries based on the previous answers, and so on until you find a satisfactory explanation. You cannot be productive if you have to wait half an hour every time you run a new query to test a hypothesis.
2. Use AQP when you can make perfect decisions with imperfect answers
Exact results have no advantage over approximate ones if they both lead to the exact same conclusion/decision. This obviously depends on the quality of your approximation and the application logic.
3. Use AQP when your data is incomplete or noisy
Believe it not, most of the data that is collected and used in the real world is extremely noisy. So the idea that processing your entire data gives you 100% accurate answers is usually an illusion anyway. In other words, if my data is noisy to begin with, I might as well take advantage of approximation, which uses a controlled degree of noise in my computation in exchange for significant speed ups. This noise in your data can be because of many reasons – human error, missing values, white noise, data extraction errors, data conversion errors.
4. Use AQP when your goal is to predict something
In general, there are many cases where the output of database queries are consumed by predictive analytics or machine learning algorithms, and in those cases, returning smaller samples of the original data can be a great solution for gaining considerable performance benefits.
Additional Reading: Oracle Database New Features Guide 12c Release 2 (12.2)
Many organisations like to count things. Counting the number of distinct elements (i.e. the cardinality) of a set is a challenge when the cardinality of the set is large. I see many use cases where an approximate result for a given query would be sufficient, rather than a completely accurate result. For example, you may have a query: “How many distinct individuals visited our website last week?”. Do you need an exact number? I would like to bring your attention to a new SQL function introduced in 12c – APPROX_COUNT_DISTINCT()
The APPROX_COUNT_DISTINCT function was added, but not documented, in Oracle 11g to improve the speed of calculating the number of distinct values (NDV) when gathering statistics using the DBMS_STATS package. Oracle Database 12c (22.214.171.124) now includes the function in the documentation so we are free to use it in our applications as a supported SQL function.
APPROX_COUNT_DISTINCT returns the approximate number of rows that contain distinct values of
This function provides an alternative to the
) function, which returns the exact number of rows that contain distinct values of
APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than
COUNT, with negligible deviation from the exact result.
expr, you can specify a column of any scalar data type other than
APPROX_COUNT_DISTINCT ignores rows that contain a null value for
expr. This function returns a
In previous database versions, if we wanted to perform a count of distinct values, we would probably have done something like the following.
SELECT COUNT(DISTINCT object_name) AS obj_count FROM all_objects; OBJ_COUNT ---------- 47171 1 row selected.
This query gives us the exact count of the distinct values based on Oracle’s read-consistency model. As such, we see all the committed data, along with any uncommitted changes made by the current session.
In contrast, the new
APPROX_COUNT_DISTINCT function does not provide exact results, but should give “negligible deviation from the exact result” according to the documentation.
SELECT APPROX_COUNT_DISTINCT(object_name) AS obj_count FROM all_objects; OBJ_COUNT ---------- 46788 1 row selected.
DROP TABLE t1 PURGE; CREATE TABLE t1 AS SELECT level AS data FROM dual CONNECT BY level <= 10000; INSERT /*+ APPEND */ INTO t1 SELECT a.data FROM t1 a CROSS JOIN t1 b; COMMIT; EXEC DBMS_STATS.gather_table_stats(USER,'T1');
Now we have a table with a little over 100 million rows and 10,000 distinct values. We can see the difference in performance of the two methods is more significant.
SET TIMING ON SELECT COUNT(DISTINCT data) AS data_count FROM t1; DATA_COUNT ---------- 10000 1 row selected. Elapsed: 00:00:19.66 SQL> SELECT APPROX_COUNT_DISTINCT(data) AS data_count FROM t1; DATA_COUNT ---------- 10050 1 row selected. Elapsed: 00:00:10.46
Conceptually, for some reason this technique reminds me of a similar problem in the web development arena. Sometimes, close enough is just enough.
The “Blur Up” Technique for Loading Background Images