Normalisation

Key concepts

Superkey: a set of attributes such that no two tuples have the same values for these attributes

(Candidate) key: a minimal superkey where none of the attributes can be removed to create another superkey.

Primary key: one of the selected candidate keys

Prime attribute: an attribute in any candidate key

Non-prime attribute: An attribute that is not a member of any candidate key


Example

R[A, B, C, D, E]
A → B​
C → {D, E}​
{A, D, E} → C​
{B, C} → A

Candidate Key(s)={{A,C},{A,D,E},{B,C}}\text{Candidate Key(s)} = \{ \{A, C\}, \{A, D, E\}, \{B, C\} \}

Superkey(s)={{A,C},{A,C,E,},{A,B,C,D,E},...}      (Anything that includes at least the candidate key)\text{Superkey(s)} = \{ \{A, C \}, \{A, C, E, \}, \{A, B, C, D, E \}, ... \} \;\;\; \text{(Anything that includes at least the candidate key)}

Prime Attributes={A,B,C,D,E}\text{Prime Attributes} = \{ A, B, C, D, E \}

Non Prime Attributes={ϕ}\text{Non Prime Attributes} = \{ \phi \}


Normalization: the process of identifying redundancy from data

Normalization is a process that aims at achieving better designed relational database schemas using

The normalization process takes a relational schema through a series of tests to certify whether it satisfies certain conditions

Normalisation Overview

Form Description
1NF Outcome: Identifying non-atomic values from relations.
Test: Relation should have no multivalued attributes or nested relations.
2NF Outcome: Identifying partial dependencies, which helps remove some anomalies.
Test: LHS of any non-trivial FD in F+ is not a proper subset of a candidate key, or RHS is a prime attribute
3NF Outcome: Identifying partial and transitive dependencies, which helps remove most anomalies
Test: LHS of any non-trivial FD in F+ is a superkey, or RHS is a prime attribute.
BCNF Outcome: Identifying all anomalies at the cost of not preserving all FDs
Test: LHS of any non-trivial FD in F+ is a superkey.

1NF

Non-1NF relations Normalised to 1NF
alt text alt text

2NF

A relation schema R is in 2NF if every non-prime attribute A in R is fully functionally dependent on the primary key of R.

alt text

More Formally

alt text


3NF

alt text

alt text

In other words, a relation is in 3NF iff for any non-trivial FD X \rightarrow A, where A is a non-prime attribute, X must be a superkey


BCNF (Boyce-Codd Normal Form)

alt text

Informally: Whenever a set of attributes of R determine another attribute, it should determine all the attributes of R.

alt text

Relational Database Schema Design

alt text


BCNF synthesis

alt text

3NF synthesis

alt text

Finding minimal cover:

alt text

alt text

Synthesis alt text