Module 4: Database design theory and Normalisation

alt text

Design Guidelines

Guideline 1: Design each relation so that it is easy to explain its meaning

Using meaningful names
Do not combine attributes from multiple entity types and relationship types into a single relation

alt text

Redundant values in Tuples

Design goal is to minimise the storage space that base relations occupy.
In addition, an incorrect grouping may cause update anomalies which may result in inconsistent data or even loss of data.

A company where an employee’s salary directly corresponds to the level or position, they hold. For example, a manager has a fixed salary of $700,000 and a developer has a fixed salary of $60,000.
- I.e. The level of the employee implies the salary of the employee

alt text

Also:

level -> Salary

Modification Anomalies

Updating the Salary of one developer, makes the “Developer” salary inconsistent.

Deletion Anomalies

Insertion Anomalies

We cannot store the salary of a “Cook” if no employee has that position.
Inserting a new row with a different Salary for a developer, makes the “Developer” salary inconsistent.

Guideline 2: Design the base relation schema so that no insertion, deletion, or modification anomalies occur in the relations

If any do occur, ensure that all applications that access the database update the relations in such a way as to not compromise the integrity of the database

Guideline 3: As far as possible, avoid placing attributes in a base relation whose values may be null

Null values waste storage space, introduce ambiguity, and cannot be used for comparison
If nulls are unavoidable, make sure that they apply in exceptional cases only in the relation

Decomposition

A decomposition of R replaces R by two or more relations such that:
- Each new relation contains a subset of the attributes of R (and no attributes not appearing in R)
- Every attribute of R appears in at least one new relation.

alt text

Before	After

Join operation

Definition: $R1 \bowtie R2$ is the natural join of the two relations

Each tuple of R1 is concatenated with every tuple in R2 having the same values on the common attributes

alt text

Lossy Join operation The word loss in lossless refers to loss of information, not loss of tuples

Why is this a lossy join?

alt text

Databases allow you to say that one attribute determines another through a functional dependency

A functional dependency (FD) $X \rightarrow Y$ holds on relation R if for every legal instance of $R$ such as $r$ , for all tuples t1, t2:

$\text{if} \; \; t_{1}[X] = t_{2}[X] \rightarrow t_{1}[Y] = t_{2}[Y]$

Which means given two tuples in r, if their X values agree, then their Y values must also agree
Example: level $\rightarrow$ salary (i.e., if two employees have the same level, then they must have the same salary)

A key is a minimal set of attributes that uniquely identify a relation

i.e., a key is a minimal set of attributes that functionally determines all the attributes in the relation A superkey for a relation uniquely identifies the relation

alt text