Querying the alert log via SQL

Quick tip regarding the Oracle database alert log (from 11g onwards). There is a fixed table X$DBGALERTEXT:


SQL> select message_text from X$DBGALERTEXT where rownum <= 30;

MESSAGE_TEXT
-----------------------------------------------------------------------------------------------------------------------------------------
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 2
Number of processor cores in the system is 2
Number of processor sockets in the system is 1
Shared memory segment for instance monitoring created
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
NUMA status: non-NUMA system
cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
Grp 0:
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
IMODE=BR
ILAT =27
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.

ORACLE_HOME = /u01/app/oracle/product/11.2.0/orcl
System name:Linux
Node name:ODIGettingStarted
Release:2.6.39-400.17.1.el6uek.x86_64
Version:#1 SMP Fri Feb 22 18:16:18 PST 2013
Machine:x86_64
Using parameter settings in client-side pfile /u01/app/oracle/admin/orcl/pfile/init.ora on machine ODIGettingStarted
System parameters with non-default values:

30 rows selected.

My personal opinion? This can be useful if you're looking to create some custom alert log monitoring. However I still prefer to  monitor my alert logs using shell scripts since accessing this X$ table requires the instance to be up and operational. But if you don't have access to the OS then this could be useful.

I also found the following Metalink note:
High CPU for Queries on X$DBGALERTEXT (Doc ID 2056666.1)

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.1 and later
Information in this document applies to any platform.

SYMPTOMS

  • Query on X$DBGALERTEXT consumes high CPU taking a long time to complete.For example:
SELECT count(*)
FROM X$DBGALERTEXT
WHERE to_date(to_char(originating_timestamp, ‘dd-mon-yyyy hh24:mi’), ‘dd-mon-yyyy hh24:mi’) > to_date(to_char(systimestamp – .00694, ‘dd-mon-yyyy hh24:mi’), ‘dd-mon-yyyy hh24:mi’) /* last 10 minutes */
AND (
message_text = ‘ORA-00600’
OR message_text LIKE ‘útal%’
OR message_text LIKE ‘%error%’
OR message_text LIKE ‘%ORA-%’
OR message_text LIKE ‘%terminating the instance%’
);
  • It can also cause ORA-700 [dbgrfafr_1].

Comments in your SPFILE

Consider using the COMMENT= clause to document why a particular change was made the next time you make a change to a parameter using an SPFILE:

SQL> alter system set pga_aggregate_target=512m comment='Changed 04-JUN-2018, AWR recommendation, MR';

System altered.

 

SQL> select value, update_comment from v$parameter where name = 'pga_aggregate_target';SQL> select value, update_comment from v$parameter where name = 'pga_aggregate_target';
VALUE      UPDATE_COMMENT
-------------------- ---------------------------------------------------------------------------------------------------
536870912  Changed 04-JUN-2018, AWR recommendation, MR

 

 

 

Introduction to Database Normalisation

Background

There are two approaches in relational database design.

* TOP DOWN: From Data Modeling (eg. ER Model) to Relational Logical Model for implementation.

* BOTTOM UP: Normalization of Relations

Normalisation is a formal process for deciding which attributes should be grouped together in a relation so that all anomalies are removed. Hence the aim is to successively reduce relations to produce smaller, well structured relations.

Dependencies

Functional Dependency:
The simplest kind of dependency is called functional dependency (FD). The dependencies are best explained through examples.

For example, LecturerID -> LecturerName
is a valid FD because:

For each LecturerID there is at most one LecturerName, or
LecturerName is determined by LecturerID , or
LecturerName is uniquely determined by LecturerID , or
LecturerName depends on LecturerID .

Each of the above statements is equivalent.

Formally;

The FD  X -> Y is a full dependency if no attribute can be removed from X.

LabDate, SubjectCode ->  Tutor is a full dependency, that is, Tutor is fully dependent on both LabDate AND SubjectCode.

Partial Dependency:

The FD  X -> Y is a partial dependency if an attribute can be removed from X.

LecturerID, SubjectCode -> LecturerName is a partial dependency, that is, LecturerName is partially dependent on LecturerID AND SubjectCode.

(to determine LecturerName, I only need to know LecturerID).

Transitive Dependency:

Dependencies can be transitive.

For example, if one lecturer can teach one subject and each subject only has one tutor, then we might have the dependencies:

LecturerID -> SubjectCode
SubjectCode -> Tutor
and, transitively LecturerID -> Tutor.

Normal Forms:

Functional dependencies can be used to decide whether a schema is well designed.

For example, in the following relation:

LecturerSubject (LecturerID, LecturerName, SubjectCode, SubjectName)

Anomalies?
If there is a new subject which has not been allocated a lecturer, can you record the details of this subject in the above table? (Insert Anomaly)

If an existing subject changes the name, can you do the changes to one instance only? (Update Anomaly)

If a lecturer resigns and the details are to be deleted, would there be a chance that some subjects will be removed permanently and we won’t have any track record of those subjects anymore? (Delete Anomaly)

Design errors in relations, such as the potential for certain kinds of anomalies, can be categorised. These categories of error can be successively eliminated by decomposing relations into normal forms.

The major/main normal forms are first (1NF), second (2NF), third (3NF), and Boyce ­Codd (BCNF). Higher/advanced normal forms including fourth (4NF), and fifth (5NF). Because problems with 4NF and 5NF rarely occur, moreover database designers in industry normally do not need to use the highest possible NF for practical reasons. We will focus on satisfying 3NF level.

First Normal Form (1NF):

A relation is in 1NF if:

  • There are no repeating groups
  • A primary key has been defined, which uniquely identifies each row in the relation.
  • All attributes are functionally dependent on all or part of the key.
  • Attributes should be stored as atomic values -> Each field entry can only contain one piece of data. E.g. A name field containing “Fred Smith” has surname and first name, violating 1NF.

Second Normal Form (2NF):

A relation is in 2NF if:

  • The relation is in 1 NF
  • All non-key attributes are fully functionally dependent on the entire key (partial dependency has been removed).

Third Normal Form (3NF):

A relation is in 3NF if:

  • The relation is in 2NF
  • All transitive dependencies have been removed. (Transitive dependency: non-key attribute dependent on another non-key attribute.)

Example

Normalize the ORDER form below:

From the ORDER FORM (user view) we can derive ORDER relation:

Currently in UNF (Un-normalized Form)

ORDER
(Order #, Customer #, Customer Name, Customer Address, City, State, PostCode, Order Date, (Product #, Description, Quantity, Unit Price))

Note that the order form is not in 1NF because there is a repeating group
(Product#, Description….).

To convert the above relation into 1NF, the repeating group must be removed by creating a new relation based on the repeating group along with the primary key of the main relation.

1NF:

ORDER
(Order#, Customer#, Customer Name,  Address, City, State, PostCode, OrderDate)

ORDER_PRODUCT
(Order#, Product#, Description, Quantity, Unit Price)    —-> Note that Order# is also a foreign key as well as a PK

Anomalies:

Insertion Anomalies: cannot insert a new product until there is an order for that product.

Deletion Anomalies: if an order is deleted the whole detail of the product will also be deleted.

Update Anomalies: if the detail of a particular product needs to be updated, each order that contains that product has to be updated.

2NF (Partial dependencies):

The ORDER_PRODUCT relation is not in 2NF because not all non-key attributes are fully dependent on the entire key (e.g. the PK is the combination of order# and product#. But description and unit price depend on product#, not order#)

To convert the ORDER_PRODUCT relation into 2NF, a new relation must be created which consists of part of the keys (becomes the primary key of the new relation) and all non key attributes that are dependent on the partial key.

ORDER_PRODUCT
(Order#, Product#, Quantity)

PRODUCT
(Product#, Description, Unit_Price)

The ORDER relation is already in 2NF as there are no non key attributes that are dependent on partial key (ORDER only has a single key).

Anomalies:

Insert Anomalies: a new customer cannot be inserted until he/she has an order.

Delete Anomalies: if an order is deleted, the whole information of the customer is also deleted.

Update Anomalies: if a customer detail is to be updated, all orders for that  customer need to be updated.

3NF (Transitive Dependencies):

The ORDER relation is not in 3NF because there is a transitive dependency (non-key attribute dependent on another non-key attribute). e.g. customerName, city, Address, etc. all depend on customer#, which is currently a non-key attribute.

To convert the relation into 3NF, a new relation must be created for the non-key attributes that are dependent to another non-key attribute.

CUSTOMER
(Customer#, Customer Name, Customer Address, City, State, PostCode)

ORDER
(Order#Customer#, Order Date)       —-> Remember to always maintain FK links

Both the order ORDER_PRODUCT and the PRODUCT relations are already in 3NF.

ORDER_PRODUCT(Order#, Product#, Quantity)
PRODUCT (Product#, Description, Unit Price)

Example Solution: Final Relations in 3NF and BCNF

ORDER
(Order#Customer#, OrderDate)

CUSTOMER
(Customer#, CustomerName, CustomerAddress, City, State, PostCode)

ORDER_PRODUCT
(Order#, Product#, Quantity)

PRODUCT
(Product#, Description, UnitPrice)

Now an example for everyone at home to try:


Here is a suggested solution:

1NF: Identify the PK.

PATIENTHISTORY(PatientNo, name, address, suburb, date, time, drNo, drName, visitCode, description)

2NF: Remove partial dependencies. Notice that currently the PK is the combination of PatientNo, date, time. name, address, and suburb only depend on PART OF THE KEY (patientNo). patientNo, date, time -> drNo so this is not a partial dependency. Same as visitcode. We will deal with drName and description shortly.

PATIENT (PatientNo, name, address, suburb)

PATIENT_HISTORY(PatientNo, date, time, drNo, drName, visitCode, description) —> Note that PatientNo is now a foreign key as well as part of the key.

3NF: Remove transitive dependency. Notice how drName is dependency on the non key drNo. Same principle for description. description is determined by visitCode.

DOCTOR (drNo, drName)

ILLNESS (visitCode, description)

PATIENT (PatientNo, name, address, suburb)

PATIENT_HISTORY (PatientNo, date, timeDrNo, visitCode)  —> DrNo and visitCode are foreign keys (pointing to the doctor and illness tables, respectively).

Quote

Relational Databases

We were hiring developers for a large, complicated application. I happened to mention to one of the more promising candidates that the application was fairly write-heavy and we might experience some performance concerns later. This developer, with years of experience and skills across multiple technologies that might have been of use to us, replied “you’re going to either need to shard your database or switch to NoSQL.”

That was enough to guarantee they didn’t get the job.

Before relational databases, all databases were NoSQL. That’s why we have relational databases.

Great read on why database design is so important

Aside

Genesis of a DBA Universe

In the beginning was the disk array, and all was empty and raw, and Unix moved over the face of the platters. And the DBA said: Let there be Oracle. And there was Oracle. And the environmental variables were set and the disks were striped and mirrored and the OFA was established, and behold spindle was rent asunder from spindle. And the DBA saw that all was in spec.

And it was day and it was evening of the first day.

And the DBA said: Let there be scripts. And sql.bsq brought forth myriad crawling things upon the face of the array. And catalog.sql brought forth all manner of tables and views that swim unseen beneath the waters. And catproc.sql brought forth all the built-in programs and all the hosts of the air, that the users might be given wings and take fight over the data.

And it was day and it was evening of the second day.

And the DBA said: Let there be tablespaces. And there were tablespaces. And the network administrator looked upon the disk array and did see what the tablespaces had wrought upon the disk arrays, and he did gnash his teeth and seek a new work upon the Internet with an engine of search.

And it was day and it was evening of the third day.

And the DBA created users. Male and female he created them. And he said unto the users: Thou mayest create tables and views as thou wilt. Yea, though mayest create even indexes upon the data. Only meddle not with the system tablespace, for it is a holy place, and on the day wherein thou treadest upon it, on that day thy create session shall surely be revoked. And the serpent crept among the users and whispered to them, saying: Thine roles shall not be revoked. Taste ye all of the system tablespace, for ye shall know of b-trees and hints and ye shall be as DBAs. And the users heeded the serpent and filled the system tablespace with crap. And the instance did crash and the client did wax wroth at the DBA. And the DBA did gnash his teeth and partake of the fruit of the vine, for behold the users were permanent employees and the DBA was but a contractor and could not revoke their create session.

And it was day and it was evening of the fourth day.

And the DBA did set default tablespaces and temporary tablespaces and did lock down all that was upon the face of the array with roles and profiles and all manner of quotas, yea even from the rollback segments even unto the archived redo logs.

And it was day and it was evening of the fifth day.

And the DBA created synonyms and links and did tune the server and apply patches upon the face of the database.

And it was day and it was evening of the sixth day.

And on the seventh day the DBA did rest from all the labors of the creation. And his pager did ring and he ceased from resting and did spend his sabbath on the telephone with Oracle support. And by the time the DBA got through to someone who knew whereof they spake behold it was day and it was evening of the eighth day.

And the DBA waxed wroth.

Approximate Query Processing in Oracle Database (12.2)

The growing market for Big Data has created a lot of interest around approximate query processing (AQP) as a means of achieving interactive response times. In my previous post I provided an explanation of one of the more well-known approximate functions – APPROX_COUNT_DISTINCT. Here was an example of how approximate query processing obtains approximate results with negligible deviation from the exact result while dramatically saving processing resources. In this post I will describe how to use some of the other new features related to approximate query processing.

About Approximate Query Processing

Approximate query processing uses SQL functions to provide real-time responses to
explorative queries where approximations are acceptable. A query containing SQL
functions that return approximate results is referred to as an approximate query. Business  intelligence (BI) applications extensively use aggregate functions, including
analytic functions, to provide answers to common business queries. For some types of
queries, when the data set is extremely large, providing exact answers can be resource
intensive.
images (1)
For example, counting the number of unique customer sessions on a website
or establishing the median house price within each post code across a state. In certain
scenarios, these types of queries may not require exact answers because you are more
interested in approximate trends or patterns, which can then be used to drive further
analysis. Approximate query processing is primarily used in data discovery
applications to return quick answers to explorative queries. Users typically want to
locate interesting data points within large amounts of data and then drill down to
uncover further levels of detail. For explorative queries, quick responses are more
important than exact values.

Oracle provides a set of SQL functions that enable you to obtain approximate results with negligible deviation from the exact result. There are additional approximate functions that support materialized view based summary aggregation strategies. The functions that provide approximate results are as follows:

  • APPROX_COUNT_DISTINCT
  • APPROX_COUNT_DISTINCT_DETAIL
  • APPROX_COUNT_DISTINCT_AGG
  • TO_APPROX_COUNT_DISTINCT
  • APPROX_MEDIAN
  • APPROX_PERCENTILE
  • APPROX_PERCENTILE_DETAIL
  • APPROX_PERCENTILE_AGG
  • TO_APPROX_PERCENTILE
Approximate query processing can be used without any changes to your existing
code. When you set the appropriate initialization parameters, Oracle Database
replaces exact functions in queries with the corresponding SQL functions that return
approximate results.

Running Queries Containing Exact Functions Using SQL Functions that Return Approximate Values

The interesting part is that you can utilize the approximate functions without changing code. There are three initialization parameters introduced to control which functions should be treated as an approximate function during runtime:

The initialization parameters are:

  • approx_for_aggregation
  • approx_for_count_distinct
  • approx_for_percentile

To replace only the COUNT(DISTINCT) function in queries with the APPROX_COUNT_DISTINCT function, set the approx_for_count_distinct initialization parameter to TRUE for the current session or for the entire database.

To replace percentile functions with the corresponding functions that return approximate results, set approx_for_percentile to PERCENTILE_CONT, PERCENTILE_DISC, or ALL (replaces all percentile functions) for the current session or for the entire database.

To run all queries using the corresponding SQL functions that return approximate results instead of the specified SQL functions, set the approx_for_aggregation initialization parameter to TRUE for the current session or for the entire database.

When Should Approximate Query Processing (AQP) Be Used?

If you work in a billing department, stop reading right now 🙂

1. Use AQP when you care about interactive response times
Exploratory data analytics is typically an interactive and iterative process: you form an initial hypothesis (e.g., by visualizing and looking at the data), running some queries, modifying your queries based on the previous answers, and so on until you find a satisfactory explanation. You cannot be productive if you have to wait half an hour every time you run a new query to test a hypothesis.

2. Use AQP when you can make perfect decisions with imperfect answers
Exact results have no advantage over approximate ones if they both lead to the exact same conclusion/decision. This obviously depends on the quality of your approximation and the application logic.

3. Use AQP when your data is incomplete or noisy
Believe it not, most of the data that is collected and used in the real world is extremely noisy. So the idea that processing your entire data gives you 100% accurate answers is usually an illusion anyway. In other words, if my data is noisy to begin with, I might as well take advantage of approximation, which uses a controlled degree of noise in my computation in exchange for significant speed ups. This noise in your data can be because of many reasons – human error, missing values, white noise, data extraction errors, data conversion errors.

4. Use AQP when your goal is to predict something
In general, there are many cases where the output of database queries are consumed by predictive analytics or machine learning algorithms, and in those cases, returning smaller samples of the original data can be a great solution for gaining considerable performance benefits.

Additional Reading: Oracle Database New Features Guide 12c Release 2 (12.2)

APPROX_COUNT_DISTINCT– Quick Distinct Count (12c)

Many organisations like to count things. Counting the number of distinct elements (i.e. the cardinality) of a set is a challenge when the cardinality of the set is large. I see many use cases where an approximate result for a given query would be sufficient, rather than a completely accurate result. For example, you may have a query: “How many distinct individuals visited our website last week?”. Do you need an exact number? I would like to bring your attention to a new SQL function introduced in 12c – APPROX_COUNT_DISTINCT()

The APPROX_COUNT_DISTINCT function was added, but not documented, in Oracle 11g to improve the speed of calculating the number of distinct values (NDV) when gathering statistics using the DBMS_STATS package. Oracle Database 12c (12.1.0.2) now includes the function in the documentation so we are free to use it in our applications as a supported SQL function.

Syntax

Description of approx_count_distinct.gif follows

Purpose

APPROX_COUNT_DISTINCT returns the approximate number of rows that contain distinct values of expr.

This function provides an alternative to the COUNT (DISTINCT expr) function, which returns the exact number of rows that contain distinct values of expr. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result.

For expr, you can specify a column of any scalar data type other than BFILE, BLOB, CLOB, LONG, LONG RAW, or NCLOB.

APPROX_COUNT_DISTINCT ignores rows that contain a null value for expr. This function returns a NUMBER.

Basic Usage

In previous database versions, if we wanted to perform a count of distinct values, we would probably have done something like the following.

SELECT COUNT(DISTINCT object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     47171

1 row selected.

This query gives us the exact count of the distinct values based on Oracle’s read-consistency model. As such, we see all the committed data, along with any uncommitted changes made by the current session.

In contrast, the new APPROX_COUNT_DISTINCT function does not provide exact results, but should give “negligible deviation from the exact result” according to the documentation.

SELECT APPROX_COUNT_DISTINCT(object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     46788

1 row selected.

Performance

DROP TABLE t1 PURGE;

CREATE TABLE t1 AS
SELECT level AS  data
FROM   dual
CONNECT BY level <= 10000;

INSERT /*+ APPEND */ INTO t1
SELECT a.data FROM t1 a
CROSS JOIN t1 b;

COMMIT;

EXEC DBMS_STATS.gather_table_stats(USER,'T1');

Now we have a table with a little over 100 million rows and 10,000 distinct values. We can see the difference in performance of the two methods is more significant.

SET TIMING ON

SELECT COUNT(DISTINCT data) AS data_count
FROM   t1;

DATA_COUNT
----------
     10000

1 row selected.

Elapsed: 00:00:19.66
SQL>


SELECT APPROX_COUNT_DISTINCT(data) AS data_count
FROM   t1;

DATA_COUNT
----------
     10050

1 row selected.

Elapsed: 00:00:10.46

Conceptually, for some reason this technique reminds me of a similar problem in the web development arena. Sometimes, close enough is just enough.
The “Blur Up” Technique for Loading Background Images

Aside

12 Days of Denormalization

On the first day of denormalization, my design gave to me
a really fast-running query.

On the second day of denormalization, my design gave to me
2 less tables.

On the third day of denormalization, my design gave to me
3 more indexes.

On the fourth day of denormalization, my design gave to me
4 larger disks.

On the fifth day of denormalization, my design gave to me
5 brand new reqs.

On the sixth day of denormalization, my design gave to me
6 times the locking.

On the seventh day of denormalization, my design gave to me
7 longer updates.

On the eighth day of denormalization, my design gave to me
8 more requirements.

On the ninth day of denormalization, my design gave to me
9 invalid rows.

On the tenth day of denormalization, my design gave to me
10 delays deleting.

On the eleventh day of denormalization, my design gave to me
11 questionable queries.

On the twelfth day of denormalization, my design gave to me
12 lessons learned.