Database Normalisation

Database Normalisation

Normalisation is a database design technique for organising data in a relational database. Normalisation is important as it reduces data redundancy & duplicated data and tries to ensure that only relevant (related) data is stored in each database table. By making sure that the database is normalised you eliminate undesirable characteristics like Insertion, Deletion and Update Anomalies.

Imagine your database being a fortress, you don't want redundant data, erroneous bad data or duplicated data entering your database system. Essentially, the user of the database should have a guarantee to 'trust' the data!

db-normalisation.png

To understand database normalisation, it is probably easier to understand some of the basic mistakes newbies make when designing their database.

  • Design mistake 1 - keys!

A common mistake is not implementing the correct primary, candidate, unique and foreign keys ...

  • Design mistake 2 - Not using atomic (single) values for the attributes

Another lazy approach is to put multiple values into single columns. It is best to keep attributes atomic!

fNamelNameskills
janedoeC++, Python, SQL
joeblogsJava, Django
pedbadTeaching, JavaScript

and don't create redundancy problems by creating repeating groups, eg skill-1, skill-2 etc..., this is not only bad but also not flexible, what happens when jane learns SQL, you have to go back and redisign your database!

fNamelNameskill-1skill-2skill-3
janedoeC++PythonSQL
joeblogsJavaDjangonull
pedbadTeachingJavaScriptnull

A good rule is that one column (attribute) should only store a single or atomic value, that is a value which cannot be divided any further...

fNamelNameskills
janedoeC++
janedoePython
janedoeSQL
joeblogsJava
joeblogsDjango
pedbadTeaching
pedbadJavaScript

In the table above we have atomic values, but the name fields are duplicates. To resolve this, we can split the table into smaller tables, a process called normalisation...


Normalisation is a technique of organising the data into multiple related tables, to minimise Data Redundancy.

Data redundancy is just repetition of similar data in multiple places in your database tables. This Repetition increases the size of your database using up more of your computer resources, it also makes it more difficult to manage and update your database and may actually risk data loss! this repetition can also cause you other issues like:

  • Insertion Anomalies
  • Deletion Anomalies
  • Updating Anomalies

> Insertion, Deletion and Updating Anomalies are very frequent if database is not normalised.