Introduction to Database Normalisation

Background

There are two approaches in relational database design.

* TOP DOWN: From Data Modeling (eg. ER Model) to Relational Logical Model for implementation.

* BOTTOM UP: Normalization of Relations

Normalisation is a formal process for deciding which attributes should be grouped together in a relation so that all anomalies are removed. Hence the aim is to successively reduce relations to produce smaller, well structured relations.

Dependencies

Functional Dependency:
The simplest kind of dependency is called functional dependency (FD). The dependencies are best explained through examples.

For example, LecturerID -> LecturerName
is a valid FD because:

For each LecturerID there is at most one LecturerName, or
LecturerName is determined by LecturerID , or
LecturerName is uniquely determined by LecturerID , or
LecturerName depends on LecturerID .

Each of the above statements is equivalent.

Formally;

The FD  X -> Y is a full dependency if no attribute can be removed from X.

LabDate, SubjectCode ->  Tutor is a full dependency, that is, Tutor is fully dependent on both LabDate AND SubjectCode.

Partial Dependency:

The FD  X -> Y is a partial dependency if an attribute can be removed from X.

LecturerID, SubjectCode -> LecturerName is a partial dependency, that is, LecturerName is partially dependent on LecturerID AND SubjectCode.

(to determine LecturerName, I only need to know LecturerID).

Transitive Dependency:

Dependencies can be transitive.

For example, if one lecturer can teach one subject and each subject only has one tutor, then we might have the dependencies:

LecturerID -> SubjectCode
SubjectCode -> Tutor
and, transitively LecturerID -> Tutor.

Normal Forms:

Functional dependencies can be used to decide whether a schema is well designed.

For example, in the following relation:

LecturerSubject (LecturerID, LecturerName, SubjectCode, SubjectName)

Anomalies?
If there is a new subject which has not been allocated a lecturer, can you record the details of this subject in the above table? (Insert Anomaly)

If an existing subject changes the name, can you do the changes to one instance only? (Update Anomaly)

If a lecturer resigns and the details are to be deleted, would there be a chance that some subjects will be removed permanently and we won’t have any track record of those subjects anymore? (Delete Anomaly)

Design errors in relations, such as the potential for certain kinds of anomalies, can be categorised. These categories of error can be successively eliminated by decomposing relations into normal forms.

The major/main normal forms are first (1NF), second (2NF), third (3NF), and Boyce ­Codd (BCNF). Higher/advanced normal forms including fourth (4NF), and fifth (5NF). Because problems with 4NF and 5NF rarely occur, moreover database designers in industry normally do not need to use the highest possible NF for practical reasons. We will focus on satisfying 3NF level.

First Normal Form (1NF):

A relation is in 1NF if:

  • There are no repeating groups
  • A primary key has been defined, which uniquely identifies each row in the relation.
  • All attributes are functionally dependent on all or part of the key.
  • Attributes should be stored as atomic values -> Each field entry can only contain one piece of data. E.g. A name field containing “Fred Smith” has surname and first name, violating 1NF.

Second Normal Form (2NF):

A relation is in 2NF if:

  • The relation is in 1 NF
  • All non-key attributes are fully functionally dependent on the entire key (partial dependency has been removed).

Third Normal Form (3NF):

A relation is in 3NF if:

  • The relation is in 2NF
  • All transitive dependencies have been removed. (Transitive dependency: non-key attribute dependent on another non-key attribute.)

Example

Normalize the ORDER form below:

From the ORDER FORM (user view) we can derive ORDER relation:

Currently in UNF (Un-normalized Form)

ORDER
(Order #, Customer #, Customer Name, Customer Address, City, State, PostCode, Order Date, (Product #, Description, Quantity, Unit Price))

Note that the order form is not in 1NF because there is a repeating group
(Product#, Description….).

To convert the above relation into 1NF, the repeating group must be removed by creating a new relation based on the repeating group along with the primary key of the main relation.

1NF:

ORDER
(Order#, Customer#, Customer Name,  Address, City, State, PostCode, OrderDate)

ORDER_PRODUCT
(Order#, Product#, Description, Quantity, Unit Price)    —-> Note that Order# is also a foreign key as well as a PK

Anomalies:

Insertion Anomalies: cannot insert a new product until there is an order for that product.

Deletion Anomalies: if an order is deleted the whole detail of the product will also be deleted.

Update Anomalies: if the detail of a particular product needs to be updated, each order that contains that product has to be updated.

2NF (Partial dependencies):

The ORDER_PRODUCT relation is not in 2NF because not all non-key attributes are fully dependent on the entire key (e.g. the PK is the combination of order# and product#. But description and unit price depend on product#, not order#)

To convert the ORDER_PRODUCT relation into 2NF, a new relation must be created which consists of part of the keys (becomes the primary key of the new relation) and all non key attributes that are dependent on the partial key.

ORDER_PRODUCT
(Order#, Product#, Quantity)

PRODUCT
(Product#, Description, Unit_Price)

The ORDER relation is already in 2NF as there are no non key attributes that are dependent on partial key (ORDER only has a single key).

Anomalies:

Insert Anomalies: a new customer cannot be inserted until he/she has an order.

Delete Anomalies: if an order is deleted, the whole information of the customer is also deleted.

Update Anomalies: if a customer detail is to be updated, all orders for that  customer need to be updated.

3NF (Transitive Dependencies):

The ORDER relation is not in 3NF because there is a transitive dependency (non-key attribute dependent on another non-key attribute). e.g. customerName, city, Address, etc. all depend on customer#, which is currently a non-key attribute.

To convert the relation into 3NF, a new relation must be created for the non-key attributes that are dependent to another non-key attribute.

CUSTOMER
(Customer#, Customer Name, Customer Address, City, State, PostCode)

ORDER
(Order#Customer#, Order Date)       —-> Remember to always maintain FK links

Both the order ORDER_PRODUCT and the PRODUCT relations are already in 3NF.

ORDER_PRODUCT(Order#, Product#, Quantity)
PRODUCT (Product#, Description, Unit Price)

Example Solution: Final Relations in 3NF and BCNF

ORDER
(Order#Customer#, OrderDate)

CUSTOMER
(Customer#, CustomerName, CustomerAddress, City, State, PostCode)

ORDER_PRODUCT
(Order#, Product#, Quantity)

PRODUCT
(Product#, Description, UnitPrice)

Now an example for everyone at home to try:


Here is a suggested solution:

1NF: Identify the PK.

PATIENTHISTORY(PatientNo, name, address, suburb, date, time, drNo, drName, visitCode, description)

2NF: Remove partial dependencies. Notice that currently the PK is the combination of PatientNo, date, time. name, address, and suburb only depend on PART OF THE KEY (patientNo). patientNo, date, time -> drNo so this is not a partial dependency. Same as visitcode. We will deal with drName and description shortly.

PATIENT (PatientNo, name, address, suburb)

PATIENT_HISTORY(PatientNo, date, time, drNo, drName, visitCode, description) —> Note that PatientNo is now a foreign key as well as part of the key.

3NF: Remove transitive dependency. Notice how drName is dependency on the non key drNo. Same principle for description. description is determined by visitCode.

DOCTOR (drNo, drName)

ILLNESS (visitCode, description)

PATIENT (PatientNo, name, address, suburb)

PATIENT_HISTORY (PatientNo, date, timeDrNo, visitCode)  —> DrNo and visitCode are foreign keys (pointing to the doctor and illness tables, respectively).