Saturday, 19 January 2013

Implementing Slowly Changing Dimension Type 3 (SCD 3 ) with SSIS

This article defines about Slowly Changing Dimension 3 ,the uses of slowly changing dimension Type 3 and its implementation using SSIS .


Slowly Changing Dimension 3:

Some times in business ,customers regional grouping changes from one region to another region over the time ,the requirement for analyses of the complete data by the new region and the analyses of the complete data by the old region is necessary , scd type 3 will make this possible


Imagine a customer who belongs to a West region group has been moved to Northwest group over the time, and we want to analyse all the sales data by West and as well as Northwest group


Type 1 doesn't meet the requirement as it updates all the old values with the new values ,in this case all the values will be updated from West to Northwest for that customer


Type 2 doesn't meet the requirement either as it associates all the old values with West region and the new values with the Northwest


The Customers source data which i have prepared for this demo contains customers information with there regions information and it looks like as shown below

Fact Source data looks like this:

I have created three separate packages one for loading customer dimensio,n another for loading fact sales , and a master package ,the date dimension is a standard data dimension and have used from AdventureWorksDW2012 DB .


The customer dimension control flow looks like this

The customer dimension Data flow looks like this

The data flow starts by extracting the data from customer source ,the extracted records are referenced with the customer dimension to check for new customers and for existing customers we will pickup the current region and previous region fields


Customers mapping of the new customers looks like below image,as we notice we don't map previous region, as there will be no previous region for new customers

In the next step I'm using conditional split to check the change of regions as shown below

At this stage i'm collecting the regions which have changed in a staging table and updating it using Execute Sql Task as shown below
As you can see in the query what I'm doing is updating the current region with the new region and previous region with the current region . After i run the master package ,the customer dimension and factsales will be inserted with the initial regional values as shown below and at this stage we have only current regions


I'm inserting a new customer record and its associated two sales for which there is a change of region as shown below


The customer dimension and factsales after the insertion of new records is shown below


As you can see the change of the customer region from West to North West ,and the associated fact records for the customer tony ,now we can analyse the data both with the new and old region .


One key point about SCD3 is that, the history will be reinstated from second change as we are holding only one column for previous history,if you need to analyse data with more changes ,than you need to include more columns and change the etl logic as necessary

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

Power BI Report Server

Power BI Report Server  is a new product which can be used to host both Power BI reports and SSRS reports. Currently beta version is ava...