Introduction to Database Normalisation

Background

There are two approaches in relational database design.

* TOP DOWN: From Data Modeling (eg. ER Model) to Relational Logical Model for implementation.

* BOTTOM UP: Normalization of Relations

Normalisation is a formal process for deciding which attributes should be grouped together in a relation so that all anomalies are removed. Hence the aim is to successively reduce relations to produce smaller, well structured relations.

Dependencies

Functional Dependency:
The simplest kind of dependency is called functional dependency (FD). The dependencies are best explained through examples.

For example, LecturerID -> LecturerName
is a valid FD because:

For each LecturerID there is at most one LecturerName, or
LecturerName is determined by LecturerID , or
LecturerName is uniquely determined by LecturerID , or
LecturerName depends on LecturerID .

Each of the above statements is equivalent.

Formally;

The FD  X -> Y is a full dependency if no attribute can be removed from X.

LabDate, SubjectCode ->  Tutor is a full dependency, that is, Tutor is fully dependent on both LabDate AND SubjectCode.

Partial Dependency:

The FD  X -> Y is a partial dependency if an attribute can be removed from X.

LecturerID, SubjectCode -> LecturerName is a partial dependency, that is, LecturerName is partially dependent on LecturerID AND SubjectCode.

(to determine LecturerName, I only need to know LecturerID).

Transitive Dependency:

Dependencies can be transitive.

For example, if one lecturer can teach one subject and each subject only has one tutor, then we might have the dependencies:

LecturerID -> SubjectCode
SubjectCode -> Tutor
and, transitively LecturerID -> Tutor.

Normal Forms:

Functional dependencies can be used to decide whether a schema is well designed.

For example, in the following relation:

LecturerSubject (LecturerID, LecturerName, SubjectCode, SubjectName)

Anomalies?
If there is a new subject which has not been allocated a lecturer, can you record the details of this subject in the above table? (Insert Anomaly)

If an existing subject changes the name, can you do the changes to one instance only? (Update Anomaly)

If a lecturer resigns and the details are to be deleted, would there be a chance that some subjects will be removed permanently and we won’t have any track record of those subjects anymore? (Delete Anomaly)

Design errors in relations, such as the potential for certain kinds of anomalies, can be categorised. These categories of error can be successively eliminated by decomposing relations into normal forms.

The major/main normal forms are first (1NF), second (2NF), third (3NF), and Boyce ­Codd (BCNF). Higher/advanced normal forms including fourth (4NF), and fifth (5NF). Because problems with 4NF and 5NF rarely occur, moreover database designers in industry normally do not need to use the highest possible NF for practical reasons. We will focus on satisfying 3NF level.

First Normal Form (1NF):

A relation is in 1NF if:

  • There are no repeating groups
  • A primary key has been defined, which uniquely identifies each row in the relation.
  • All attributes are functionally dependent on all or part of the key.
  • Attributes should be stored as atomic values -> Each field entry can only contain one piece of data. E.g. A name field containing “Fred Smith” has surname and first name, violating 1NF.

Second Normal Form (2NF):

A relation is in 2NF if:

  • The relation is in 1 NF
  • All non-key attributes are fully functionally dependent on the entire key (partial dependency has been removed).

Third Normal Form (3NF):

A relation is in 3NF if:

  • The relation is in 2NF
  • All transitive dependencies have been removed. (Transitive dependency: non-key attribute dependent on another non-key attribute.)

Example

Normalize the ORDER form below:

From the ORDER FORM (user view) we can derive ORDER relation:

Currently in UNF (Un-normalized Form)

ORDER
(Order #, Customer #, Customer Name, Customer Address, City, State, PostCode, Order Date, (Product #, Description, Quantity, Unit Price))

Note that the order form is not in 1NF because there is a repeating group
(Product#, Description….).

To convert the above relation into 1NF, the repeating group must be removed by creating a new relation based on the repeating group along with the primary key of the main relation.

1NF:

ORDER
(Order#, Customer#, Customer Name,  Address, City, State, PostCode, OrderDate)

ORDER_PRODUCT
(Order#, Product#, Description, Quantity, Unit Price)    —-> Note that Order# is also a foreign key as well as a PK

Anomalies:

Insertion Anomalies: cannot insert a new product until there is an order for that product.

Deletion Anomalies: if an order is deleted the whole detail of the product will also be deleted.

Update Anomalies: if the detail of a particular product needs to be updated, each order that contains that product has to be updated.

2NF (Partial dependencies):

The ORDER_PRODUCT relation is not in 2NF because not all non-key attributes are fully dependent on the entire key (e.g. the PK is the combination of order# and product#. But description and unit price depend on product#, not order#)

To convert the ORDER_PRODUCT relation into 2NF, a new relation must be created which consists of part of the keys (becomes the primary key of the new relation) and all non key attributes that are dependent on the partial key.

ORDER_PRODUCT
(Order#, Product#, Quantity)

PRODUCT
(Product#, Description, Unit_Price)

The ORDER relation is already in 2NF as there are no non key attributes that are dependent on partial key (ORDER only has a single key).

Anomalies:

Insert Anomalies: a new customer cannot be inserted until he/she has an order.

Delete Anomalies: if an order is deleted, the whole information of the customer is also deleted.

Update Anomalies: if a customer detail is to be updated, all orders for that  customer need to be updated.

3NF (Transitive Dependencies):

The ORDER relation is not in 3NF because there is a transitive dependency (non-key attribute dependent on another non-key attribute). e.g. customerName, city, Address, etc. all depend on customer#, which is currently a non-key attribute.

To convert the relation into 3NF, a new relation must be created for the non-key attributes that are dependent to another non-key attribute.

CUSTOMER
(Customer#, Customer Name, Customer Address, City, State, PostCode)

ORDER
(Order#Customer#, Order Date)       —-> Remember to always maintain FK links

Both the order ORDER_PRODUCT and the PRODUCT relations are already in 3NF.

ORDER_PRODUCT(Order#, Product#, Quantity)
PRODUCT (Product#, Description, Unit Price)

Example Solution: Final Relations in 3NF and BCNF

ORDER
(Order#Customer#, OrderDate)

CUSTOMER
(Customer#, CustomerName, CustomerAddress, City, State, PostCode)

ORDER_PRODUCT
(Order#, Product#, Quantity)

PRODUCT
(Product#, Description, UnitPrice)

Now an example for everyone at home to try:


Here is a suggested solution:

1NF: Identify the PK.

PATIENTHISTORY(PatientNo, name, address, suburb, date, time, drNo, drName, visitCode, description)

2NF: Remove partial dependencies. Notice that currently the PK is the combination of PatientNo, date, time. name, address, and suburb only depend on PART OF THE KEY (patientNo). patientNo, date, time -> drNo so this is not a partial dependency. Same as visitcode. We will deal with drName and description shortly.

PATIENT (PatientNo, name, address, suburb)

PATIENT_HISTORY(PatientNo, date, time, drNo, drName, visitCode, description) —> Note that PatientNo is now a foreign key as well as part of the key.

3NF: Remove transitive dependency. Notice how drName is dependency on the non key drNo. Same principle for description. description is determined by visitCode.

DOCTOR (drNo, drName)

ILLNESS (visitCode, description)

PATIENT (PatientNo, name, address, suburb)

PATIENT_HISTORY (PatientNo, date, timeDrNo, visitCode)  —> DrNo and visitCode are foreign keys (pointing to the doctor and illness tables, respectively).

The definitive guide to the OSI and TCP/IP Model (Layer 1 – Physical Layer)

Introduction

After speaking to many DBAs it seems that the OSI model is the one topic that causes the most grief when it comes to understanding how networks work. In this post I will attempt to provide some background knowledge primarily focusing on Layer 1 – the Physical Layer. I hope you can use this post to further improve your understanding.

Layered Models:

The IT industry uses layered models to describe the complex process of network communication. Protocols for specific functions in the process are grouped by purpose into well-defined layers. By breaking the network communication process into manageable layers, the industry can benefit in the following ways:

■ Defines common terms that describe the network functions to those working in the industry and allows greater understanding and cooperation.
■ Segments the process to allow technologies performing one function to evolve independently of technologies performing other functions. For example, advancing technologies of wireless media is not dependent on advances in routers.
■ Fosters competition because products from different vendors can work together.
■ Provides a common language to describe networking functions and capabilities.
■ Assists in protocol design, because protocols that operate at a specific layer have defined information that they act upon and a defined interface to the layers above and below.

What is the OSI Model?

The Open Systems Interconnection (OSI) model, known as the OSI model, provides an abstract description of the network communication process. Developed by the International Organization for Standardization (ISO) to provide a road map for nonproprietary protocol development, the OSI model did not evolve as readily as the TCP/IP model.

In a nutshell, the communication process beings at the application layer of the source, and the data is passed down to each layer to be encapsulated with supporting data until it reaches the physical layer and is put out on the media. When the data arrives at the destination, it is passed back up through the layers and decapsulated (decapsulation is the process of stripping off one layer’s headers and passing the rest of the packet up to the next higher layer on the protocol stack) by each layer.

In other words, for application data to travel uncorrupted from one host to another, header (or control data), which contains control and addressing information, is added to the data as it moves down the layers. The process of adding control information as it passes through the layered model is called encapsulation. To reiterate, decapsulation is the process of removing the extra information and sending only the original application data up to the destination application layer.

Each layer adds control information at each step. Each layer provides data services to the layer directly above by preparing information coming down the model or going up. The generic term for data at each level is protocol data unit (PDU).

The OSI model is used to reference the process of communication, not to regulate it. Many protocols in use today apply to more than one layer of the OSI model. This is why some of the layers of the OSI model are combined in the TCP/IP model. Which leads us to…

The TCP/IP Model:

The TCP/IP model evolved faster than the OSI model and is now more practical in describing network communication functions. The OSI model describes in detail functions that occur at the upper layers on the hosts, while networking is largely a function of the lower layers.

When juxtaposed, you can see that the functions of the application, presentation, and session layers of the OSI model are combined into one application layer in the TCP/IP model. The bulk of networking functions reside at the transport and the network layers, so they remain individual layers. TCP operates at the transport layer, and IP operates at the Internet layer. The data link and physical layers of the OSI model combine to make the network access layer of the TCP/IP model.

So what’s the purpose of the physical layer in the OSI model?

The role of the OSI physical layer is to encode the binary digits that represent data link layer frames into signals and to transmit and receive these signals across the physical media—copper wires, optical fiber, and wireless—that connect network devices. The datalink frame that comes down to the physical layer contains a string of bits representing application, presentation, session, and transport and network information. These bits are arranged in the logical order required by the specific protocols and applications that use them. These bits must travel over a physical medium such as copper cable or a glass fiberoptic cable, or wirelessly through the air.

The physical medium is capable of conducting a signal in the form of voltage, light, or radio waves from one device to another. It is possible that the media will be shared by traffic from many protocols and subjected to physical distortions along the way. Part of the physical layer design is to minimize these effects of overhead and interference.

The delivery of frames across the local media requires the following physical layer elements:
■ The physical media and associated connectors
■ A representation of bits on the media
■ Encoding of data and control information
■ Transmitter and receiver circuitry on the network devices

After the signals traverse the medium, they are decoded to their original bit representations of data and given to the data link layer as a complete frame.

When the physical layer puts a frame out onto media, it generates a set patterns of bits, or signal pattern, that can be understood by the receiving device. They are organized so that the device will be able to understand when a frame begins and when it ends. Without the signal pattern, the receiving device will not know when the frame ends, and the transmission will fail.

The physical layer performs functions very different from the other OSI layers. The upper layers perform logical functions carried out by instructions in software. The upper OSI layers were designed by software engineers and computer scientists who designed the services and protocols in the TCP/IP suite as part of the Internet Engineering Task Force (IETF). By contrast, the physical layer, along with some similar technologies in the data link layer, defines hardware specifications, including electronic circuitry, media, and connectors. Instead of software engineers, the physical layer specifications were defined by electrical and communications engineering organizations.

tl;dr:

OSI Layer 1 takes data link layer frames and encodes the data bits into signals that travel copper, fiber-optic, or wireless media to the next device, where they are decoded and sent back up to the data link layer.

Copper cable, fiber-optic cable, and wireless media have varying performance benefits and costs that determine their use in a network’s infrastructure. Physical layer equipment standards describe the physical, electrical, and mechanical characteristics of the physical media and the connectors used to connect media to devices. These standards are under constant review and are updated as new technologies become available.