In this article we will learn about one new component from SSIS toolbox which is called as SCD (Slowly Changing Dimensions).
Links to other Labs of Learn MSBI Step by Step.
Learn MSBI Part 1:- Creating SQL tables, Practicals of SSIS-ETL(Extraction, Transformation & Loading)
Learn MSBI Part 2:- Performing Validation on Records
Learn MSBI Part 4:- Fact and Dimension.
Learn MSBI Part 5:- CUBE Creation
Learn MSBI Part 6:- Displaying reports using SSRS
Learn MSBI Part 7:- DQS (Data Quality Services)
Learn MSBI Part 8:- Data Profiling Task
Learn MSBI Part 9:- Pivot, UnPivot and Aggregation
Learn MSBI Part 10:- Execute SQL Task component (SSIS)
Learn MSBI Part 11:- Send Mail Task & Script Task Component
Learn MSBI Part 12:- CDC Concept using SSIS
Learn MSBI Part 13:- Asynchronous, Synchronous, Full-blocking, Semi-blocking & No-blocking
Learn MSBI Part 14:- Understanding Page Split and maintaining SSIS Performance
Learn MSBI Part 15:- Debugging in deployment – Instrumentation using Data Taps
Learn MSBI Part 16:- Using Multi-threading and its performance in SSIS program
We will learn about one new component from SSIS toolbox which will handle the situation in order to update data if it already existing and changed/altered or make a fresh entry if it is being coming for the first time. This component is called SCD (Slowly Changing Dimensions) available under SSIS Toolbox under Common –> Slowly Changing Dimensions.
Purpose behind Microsoft naming it to Slowly Changing Dimension(SCD) is that it is used where there is no frequent i.e. less updating of data like Master Data which are less likely to be updated.
Please note: Avoid it using on frequent and large number of data record changes which will result in slow down of application.
Following is the diagram which depicts the same in the form of flowchart to get scenario cleared in more detailed manner.
Scenario 1 “Insert”: When data coming from CSV file will be entered as new entry into database if it is coming for the first time.
Scenario 2 “Update”: When existing data but changed/altered then will be updated in the database.
With the help SCD component we will accomplish above scenarios and will see practical steps to achieve the same.
The idea or basics of SCD is that it will work on condition that primary key will never change.
Starting with Scenario 1 we will enter CSV called “State.txt” data into database table called “dbo.tblState”
- Drag and drop Flat File Source from SSIS Toolbox on Data Flow.
- Double click and edit it using Editor and click on Connection Manager.
- Now click on New Connection Manager.
- Give Connection manager suitable name as “StateConnection”.
- Browse for path of CSV file from your local system.
- Once path is given then check it by clicking on “Columns’ tab.
- Now drag & drop ADO.NET Destination and configure it.
- Once ADO.NET Destination Editor is opened select dataware house & database called CustomerDataWareHouse & table “dbo.tblState”
- Now check the Mappings for Input & Destination before you click OK.
Now press ctrl+F5 on keyboard or click on the Start button available on the top menu to run application in debug mode.
Once debug mode is successfully executed then open SQL Server Management Studio and execute command “Select Top 1000 Rows” to display the content of the table called “dbo.State” as shown in the following image
After completion of Scenario 1 now we will move Scenario 2 and do the practical for it.
Now in the second scenario we will introduce SCD component to fulfil our need that changed/altered existing data will be updated in the database or it will remain as it if unchanged.
For this just click on SCD(Slowly Changing Data) available on Common under SSIS Toolbox and pull on Data Flow.
Drag blue arrow output of “States” and drop it to input to SCD(Slowly Changing Data)
Delete ADO.NET connection as we will need it no more. Here we have deleted ADO.NET connection because SCD will itself write all the logic and get connected to SQL Server.
Once you double click SCD or do right click “Edit” it will open a wizard to configure Slowly Changing Dimension. This wizard will guide and help you to configure SCD. Click Next.
Next screen of the wizard will come where you have to give table name on which SCD will work. Here you will see “Configure OLE DB Connection Manager” window click on “New” which will open “Connection Manager” window as you can see in the following screen. Here add new Server name “KHADAK-PC”
Please Note: If you want to get Server Name then click on File –> Connect Object Explorer on Microsoft SQL Server Management Studio.
And next is to select database name here we have database name called “CustomerDataWareHouse” and click on OK.
After you click OK under “Data connections” you will see Configure OLE DB Connection Manager as you see in the following image then click on OK.
On Connection Manager you will see Server and Database name now just select Table name [dbo].[table]
Now you will be able to see Input Column and Dimension Column where you can select it from drop down.
Once you set Input Column and Dimension Column at the bottom you will see a warning message which states that DataTypes of Input Column(CSV file) which is “string” as we have already discussed in our earlier labs that default DataType of CSV file which is default “string” in nature and it doesn’t match with Dimension(SQL Server Database table column) which is “int”.
The following image which is representing reason for mismatching of datatypes on the left side of image you will see CSV file with “string” data type and on the right side database table with “int” data type.
So now we will introduce one new data component on data flow pane in order to do conversion of data type from string to int datatype to remove mismatch.
Steps: 1) From SSIS toolbox drag “Data Conversion” available under Common and drop on data flow.
2) Drag the output of States component and drop it as input to Data Conversion component.
3) Now configure this new component by doing right click on the component or simply
double click on the component to open component editor.
1) Once you see editor screen in front of you now tick on the “StateId” which is “Available Input Column”.
2) Once “StateId” is selected as Input Column now set its Output Alias with suitable name “intStateId”.
3) Now very important set its Data Type from dropdown as “four-byte signed integer [DT_I4]” which is “integer” or “int”.
Once the Data Conversion component is configured now
- Drag the output of Data Conversion as input to SCD component and double click SCD component to configure component.
- Once double clicked you will see SCD wizard open the select table of database, here we have selected [dbo][tblState]
- Now select Input Columns to “intStatesId”.
- Set Key Type to “Business key”.
- Click “Next”.
Now in the next screen you will see following images shown three options: –
- Fixed Attribute: It will entertain only column with fixed attribute and if any column changes than it will be treated as error and send it back through the red arrow.
- Changing Attribute: This option will overwrite existing values and replace it with the new value. Changing Attribute is denoted by “Type 1” under Change Type.
- Historical Attribute: This option saves updated new data in existing column and moves old/previous data as history. This option is marked as “Type 2” change.
In short Change Type – “Type 1” will not maintain history and “Type 2” will maintain history
Once you click Finish you will see two output given by Slowly Changing Dimension(SCD): –
- Records which does not exist and will get created as new records and go to ‘Insert Destination” through blue arrow of “New Output” component.
- Existing records in the database will be updated and it will go through blue arrow “Changing Attributes Updates Output” to OLE DB Command component.
On both the output of SCD you will find red cross mark on the components and if you hover mouse over the red cross mark you will see error which states that conversion of Column ”StateName” cannot be done between unicode and non-unicode string data types.
Our input coming from CSV file is by default non-unicode and our database table is nvarchar(50) which is nothing but Unicode i.e. mismatch of data representation at both ends
Unicode requires 2 bytes of storage which is 256*256 = 65536 it supports non-english language.
Non-unicode require 1 byte of storage which is 256 it supports only English language.
Now we will do data conversion for “CustomerName” column in order to match our data representation at both ends.
Edit again “Data conversion” component to do conversion of input CSV file from non-unicode to unicode
1. check “StateName” under Available Input Columns.
2. Now for converted new column give suitable name here we have given “UStateName”.
3. Set DataType to Unicode string[DT_WSTR].
4. Click OK and close the Data Conversion Transformation Editor.
5. Edit again “Insert Destination” component to re-configure it and change the Mappings as you can see in the following image.
6. Click on Mapping tab, change existing mapping and set Input Column to new column which “UStateName” and unicode.
7. Finally click OK after setting Mapping to close the window.
Now you will find as both input and destination are Unicode characters, components existing red mark has gone as you see in the following image.
Now we will remove the error from OLE DB Command by opening it’s Advanced Editor just do right click on “OLE DB Command” component.
So on Editor screen you will see four tabs go on each tab’s and configure it.
Connection Manager: Here you will find that it has automatically taken database name under Connection Manager and if you wish to change it, that can be done manually and set new connection manager.
Component Properties: Here mention SQL command so write Update syntax here. But here you will find command is already automatically written by SCD. So here first question mark (?) is “Param_0” and then followed by second question mark (?) which is “Param_1”. So followed by as many questions if present those will also be sequenced in the similar manner. Here “Param_0” is StateName and “Param_1” is StateId.
Column Mappings: Now if you go to next tab you will see default Input and Output column so change StateName to new UStateName and map it to Param_0 and let remain StateId to mapped to Param_1.
Input and Output Properties: Remain set it to default properties only.
Please Note:Purpose of adding OLE DB Command is that it does update and delete which regular destination components do not perform it.
As all things are set now we will run the package in debug mode and check the output. After running package in debug mode you will see following screen where all package have been executed successfully with checked green tick mark at the right top-side. On the output of Slowly Changing Dimension (SCD) output blue arrow you will find “New Output (5 rows)” which new records of 5 rows has been successfully entered in database table.
In order to check “Insert Destination” component data which is attached to data base table “dbo.tblState”. Go to SQL Server Management Studio as shown in the following image snapshot which shows successfully 5 new rows data has entered into the database table.
As of for now we have seen demonstration of successful insertion of new records.
Now we will see Update demonstration by updating existing “Assam” state value to “Assamese” in CSV and then see whether the update command is working by keeping other State value unchanged.
once on CSV value is saved with Assam to Assamese then run debug on package then you see 5 rows moving in and after processing in SCD in “Insert Destination” you will find zero records entered as there were no new records added but in “OLE DB Command” as Assam was changed to Assamese hence update was taken place and new 1 row is now updated.
Now if you go to database table you will see new updated value by clicking on Execute as shown in the below image.
Please Note: – SCD works on simple principle that where ID of the table column should remain unchanged.
Historical Attribute – Type 2
Now we will look into another attribute called Historical Attribute which is also called as Type 2 where we can maintain audit trail (audit log) i.e. step by step documented history of a transaction.
In order to achieve that we have to make changes to our database table and in that edit our table “dbo.tblState” in design mode and add one more column to it.
Name that column as “IsNew” and use “Bit” as its DataType which will display Boolean value True or False according to the condition i.e. if the value is New then it will be or else it will be False.
Now to consume this new design structure of the table we need to remove primary key as there will be duplicate ID in the table in order to show audit trail for old and new values.
To use this new table design we have to again reconfigure SCD (Slow Changing Dimensions) for that below demonstrated steps.
Once Edit is clicked Slowly Changing Dimension Wizard will open where we will select Type 2 – Historical Change to maintain audit trail for that you have to follow same wizard steps which we followed earlier
Below image is Screen 1 which will display details with wizard, which serves info and if you do not want see this page again then just tick the checkbox and then click on Next.
In the Screen 2 you will see new field added in the table of “Dimensions Columns” called “Isnew” it will only serve purpose to identify new or old record so it won’t have any Key Type and now click Next for following screen.
In this screen you will see Attribute settings here we will set “Historical Attribute” to “StateName” in order maintain audit trail which is also called as Type 2 change.
Once Dimension Column “StateName” is set to “Historical” from drop down as Change Type then click on “Next” to move to following page.
With this new design table in the database new column “IsNew” which will hold “True” or “False” values. If the value is current or New it will show as “True” and Old or Expiration value it will have value of False as shown in the following image
Let remain next page of Inferred Dimension Members as it is only and just click on Next to move ahead which will show finishing page of the SCD wizard. Just check all the outputs which are configured and shown on this page and finally click on Finish.
Once you click on Finish you will see following component existing along with new component automatically added to DataFlow as you see can in the below image along with red crossed mark on “Insert Destination” which state the same error of conversion of Unicode which we earlier had in this article. Now in order to get rid of this problem we will change the mappings.
To remove red cross marked error just do right click and click on Edit, it will open Editor and on left side click on Mappings and on right side map “UStateName” as Available Input to StateName as shown in the following image and click on OK.
And now we will click “Start” to debug the package and check it is working fine or not. In the following image you will see green color checked on each components which states that each component is running successfully.
And finally we will check our database table
Now change Assamese to Assam on “State” Notepad as shown in the below image
After you do the changes to notepad by changing text Assamese –> Assam, save notepad and again re-run the whole package it will show you the following screen with many more new components automatically added to Data flow task. The newly added component to the package are essential to get our Type 2 result in the output. Here it will maintain existing data as history and will add new record to database.
As notepad contains 5 rows but only 1 row has been changed/altered from Assamese –> Assam so you will find in 1 row being revolved across all component in the below.
In debug mode once you find all component are successful with green mark now go to the database and from our table “dbo.tblState” by doing right click on table and select “Select Top 1000 Rows” you will find old value which is Assamese is marked as False and new value which is added in new row is marked as True.
Also see our below start-up MSBI project series videos: -