Saturday, December 8, 2012

What is normalization ?

What is normalization?

         Database normalization is a data design and organization process applied to data structures based on rules that help build relational databases. In other words, process of decomposing a complex relation into a simple and stable data structure.
Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. 

Benefits of normalization:
1.   Minimize data redundancy.
2.   Reducing required storage space
3.   Preventing the need to restructure existing tables to accommodate new data.
4.   Increased speed and flexibility of queries, sorts, and summaries.

First Normal Form (1NF)

A relation schema is in 1NF:
– If and only if all the attributes of the relation R are atomic in nature.
– Atomic: the smallest level to which data may be broken down and remain meaningful
  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second Normal Form (2NF)

A Relation is said to be in Second Normal Form if and only if:
It is in the First normal form, and
No partial dependency exists between non-key attributes and key attributes.
• An attribute of a relation R that belongs to any key of R is said to be a
prime attribute and that which doesn’t is a non-prime attribute
  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)

A relation R is said to be in the Third Normal Form (3NF) if and only if
It is in 2NF and
No transitive dependency exists between non-key attributes and
key attributes.
  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.
Boyce-Codd Normal Form (BCNF or 3.5NF)

The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
- if and only if all the determinants are candidate keys.
BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.
  • Meet all the requirements of the third normal form.
  • Every determinant must be a candidate key.
Fourth Normal Form (4NF)

Finally, fourth normal form (4NF) has one additional requirement:
  • Meet all the requirements of the third normal form.
  • A relation is in 4NF if it has no multi-valued dependencies.
Keys:

1.   Super Key:
An attribute or group of attributes, that is sufficient to distinguish every tuple in the relation from every other one.
Each Superkey is called a candidate key.
Any superset of a candidate Key is a super key.
A candidate key is all those set of attributes which can uniquely identify a row. However, any subset of those set of attributes would not identify a row uniquely.

2.   Candidate Key:
A candidate key can be any column or a combination of columns that can be qualify as a unique key in database.
There can be multiple candidate keys in one table. Each candidate key can qualify as s primary key.

3.   Primary key:
A primary key is a column or combination of columns that uniquely identify a record.
Only one candidate key can be a primary key.
        A primary key which is a combination of more than one attribute is called a Composite primary key.

4.   Foreign Key:
A foreign key is a “copy” of a primary key that has been exported from one relation into another to represent the existence of a relationship between them.
Overlapping candidate keys: Two candidate keys overlap if they involve any attribute in common.
Attribute that does not participate in any candidate key is called a Non-key attribute

No comments:

Post a Comment