Monday 7 July 2014

Slowly Changing Dimensions

Slowly Changing Dimensions (SCD):
Slowly changing dimensions are the dimensions in which the data changes slowly, rather than changing regularly on a time basis. Slowly changing dimensions (SCD) determine how the historical changes in the dimension tables are handled. 
Generally we use following three types of slowly changing dimensions like: SCD type1, SCD type 2, SCD type3.
SCD Types:
a.      SCD type 1
It is used when there is no need to store historical data in the dimension table. This method overwrites the old data in the dimension table with the new data. It is used to correct data errors in the dimension.
E.g. If you entered address for employee is Mumbai and after some time the employee address is changed then you can use SCD Type1.
EmpNo
EmpName
Address
1001
Sagar
Mumbai(India)

If new address is Pune then you can overwrite old one.
EmpNo
EmpName
Address
1001
Sagar
Pune(India)

b.      SCD type 2
SCD type 2 stores the entire history the data in the dimension table. With type 2 we can store unlimited history in the dimension table. In type 2, you can store the data in three different ways.
They are:
ü  Versioning
ü  Flagging
ü  Effective Date
SCD type 2 Versioning: In versioning method, a sequence number is used to represent the change. The latest sequence number always represents the current row and the previous sequence numbers represents the past data.
For the same example we can use SCD Type2

EmpNo
EmpName
Address
Version
1001
Sagar
Mumbai(India)
1

If new address is Pune then you can add new version data as:
EmpNo
EmpName
Address
Version
1001
Sagar
Mumbai(India)
1
1001
Sagar
Pune(India)
2

If we add new location then version no also changes.

SCD type 2 Flagging: In flagging method, a flag column is created in the dimension table. The current record will have the flag value as 1 and the previous records will have the flag as 0.

EmpNo
EmpName
Address
Flag
1001
Sagar
Mumbai(India)
1

If new address Pune is added then the old records will be updated with flag value as 0 and the latest record will have the flag value as 1.

EmpNo
EmpName
Address
Flag
1001
Sagar
Mumbai(India)
0
1001
Sagar
Pune(India)
1

SCD type 2 Effective Date: In Effective Date method, the period of the change is tracked using the start_date and end_date columns in the dimension table.

EmpNo
EmpName
Address
Start_date
End_date
1001
Sagar
Mumbai(India)
01-Mar-2010
20-Feb-2011
1001
Sagar
Pune(India)
21-Feb-2011
NULL

The NULL in the End_Date indicates the current version of the data and the remaining records indicate the past data. 

c.       SCD type 3
Only the current status and previous status of the row is maintained in the table. To track these changes two separate columns are created in the table. It maintains the most recently used record history.
EmpNo
EmpName
New address
Old address
1001
Sagar
Mumbai(India)
NULL

If you add new address Pune then data stored should look like
EmpNo
EmpName
New address
Old address
1001
Sagar
Pune(India)
Mumbai(India)


The type 3 method will have limited history and it depends on the number of columns you create.


No comments:

Post a Comment