In this article we will explain about CDC using SSIS.
Links to other Labs of Learn MSBI Step by Step.
Learn MSBI Part 1:- Creating SQL tables, Practicals of SSIS-ETL(Extraction, Transformation & Loading)
Learn MSBI Part 2:- Performing Validation on Records
Learn MSBI Part 3:- SCD (Slow Changing Data)
Learn MSBI Part 4:- Fact and Dimension.
Learn MSBI Part 5:- CUBE Creation
Learn MSBI Part 6:- Displaying reports using SSRS
Learn MSBI Part 7:- DQS (Data Quality Services)
Learn MSBI Part 8:- Data Profiling Task
Learn MSBI Part 9:- Pivot, UnPivot and Aggregation
Learn MSBI Part 10:- Execute SQL Task component (SSIS)
Learn MSBI Part 11:- Send Mail Task & Script Task Component
Learn MSBI Part 13:- Asynchronous, Synchronous, Full-blocking, Semi-blocking & No-blocking
Learn MSBI Part 14:- Understanding Page Split and maintaining SSIS Performance
Learn MSBI Part 15:- Debugging in deployment – Instrumentation using Data Taps
Learn MSBI Part 16:- Using Multi-threading and its performance in SSIS program
CDC(Changing Data Capture) and SCD(Slow Changing Dimensions) are very much similar as they both match and synch up the data records of source and destination.
SCD(Slow Changing Dimensions) is the same which was done in the 5 labs previously to this. CDC(Changing Data Capture) and SCD(Slow Changing Dimensions) achieve same goals and looks very similar but CDC is a mechanism or functionality which works internally very differently also it meant for different purpose. It synch up the source and destination in an
incremental manner.
CDC is not the part of SSIS it belongs to SQL Server and it has to be enabled of SQL Server. With CDC(Changing Data Capture) as the name state it will help to track activity of the changing data on the table whether
- it can be adding new row to the table,
- deletion of the row or
- updating the row of the table.
It is marked with the notation called _$operation where for delete, add and update is noted with numbers as 1, 2 and 3, 4.
_$operation |
Operation |
1 |
Delete |
2 |
Add |
3 |
Old Value (Statement before update) |
4 |
New Updated Value |
When to use CDC?
- CDC is a concept used with SQL Server and works under SQL Server environment which means that source and destination should be SQL Server. If both source and destination is SQL Server consider using of CDC.
- It does incremental changes i.e. it is being told at source itself what changes done at source and then capture those changes accordingly marked it as delete, add and update with old and new values.
- CDC is also preferred for large number of record updates. If it is compared with SCD where only master table and dimension table can be updated here CDC performs update operation on big tables and values i.e. it works on large number of records as well.
Enable CDC on SQL Server Table
Step 1: Go and enable CDC on the database level.
In order to enable CDC on the database level, go and click on File à New à Database Engine Query it will open editor to write a SQL query.
Once the query pane is open here write the following text and then click on Execute in order to run the query.
EXEC sys.sp_cdc_enable_db
After the query is successful then below within Message it will show successful execution as shown in the image down below.
Now CDC is successfully enabled on the database level.
So before enabling CDC on the table “dbo.SalesCDC” there is a pre-requisite starting SQL Server Agent is mandatory.
Enable SQL Server Agent
For that open RUN command window and in that type the command “services.msc” from the services of the local system on which SQL Server is running.
Step 2: Go and define specific table, in this case table “dbo.SalesCDC” on which CDC is need to enabled.
For that write the following query on the SQL Query pane
EXEC sys.sp_cdc_enable_table
@source_schema = ‘dbo’,
@source_name = N’SalesCDC’,
@role_name = NULL
And then run the query by clicking on the “Execute” button available on the top. After the query is successfully executed it will show successful message at the bottom.
Now go to Object Explorer, under the database do a refresh by doing right click on System Tables a new CDC table is created with name “cdc.dbo_SalesCDC_DT”. This table will capture or track the activity of the “dbo.SalesCDC” table which are newly created, updated or deleted.
Using CDC in SSIS
Once the “cdc.dbo_SalesCDC_DT” table is under the System Table is created. Next is to use CDC with SSIS component which will work only on “cdc.dbo_SalesCDC_DT” table and only tracks for the activity of new statement, deleted statement or update statement. In this case for understanding and learning purpose working with limited number of records will be done but in actual CDC is specially meant for tracing activity of huge records new statement, deleted statement or update statement.
In this lab source table i.e. “cdc.dbo_SalesCDC_DT” table will be synch up with destination table using SSIS component. So main focus would be demonstration of practical steps to achieve the same.
So here below will start the same: –
Step 1: On SQL Server, already source table is available next is to create destination table on SQL Server for that create script of existing source table as there is a need of exact destination table. So using script an exact replica of source table will be created.
Practical Step: On Object Explorer of SQL Server Management Studio do a right click on existing source table click on Script Table as à CREATE To à New Query Editor Window in turn it will open query pane.
Once the query pane is open on it edit existing name and change it dbo.[SalesCDCDest] and then click on “Execute” option available on the top and top bottom it will show the message of creating the destination table successfully.
Then on the Tables do a right click and do refresh in order to see the created destination table with name “dbo.SalesCDCDest”.
Working of CDC component in SSIS
Step 1: Enable CDC(CT table) which is with name “cdc.dbo_SalesCDC_DT” of source table “dbo.SalesCDC” (already done, one time process).
Step 2: All the records or data from source table will be copied to the destination.(one time process)
Step 3: CDC component of SSIS will track for changes in “cdc.dbo_SalesCDC_DT” table of SQL Server which will be ongoing process to track the changes with the help of its very own small SSIS CDC table.
SSIS will create its own SSIS CDC table to know last status of the table records.
Practical of SSIS CDC component:
Practical of SSIS CDC component will be of two phases. First phase will get record from source will be copied to the destination.
Second phase is in which records available in the CDC table of SQL Server will be synched up i.e. changes made will be integrated and it is continuous process.
Starting with Phase 1 complete load:
Step 1: Drag and drop two containers on Control Flow from SSIS Toolbox available under the container.
Step 2: Rename both container one container as “Phase 1 complete load” and the second container as “Phase 2 Incremental load”.
Step 3: Drag and drop Data Flow Task on the first container.
Step 4: Rename Data Flow Task to Complete Data.
Next is double click on the Data Flow Task component which is now renamed to Complete Data so that it can be configured.
Once Data Flow is open, Step 1: drag “ADO.NET Source” from “Other Sources” available under SSIS Toolbox and drop it on Data Flow pane.
Step 2: Now drag “ADO.NET Destination” from “Other Destination” available under SSIS Toolbox and drop it on Data Flow pane.
Step 3: Configure the “ADO.NET Source” by opening Editor of the component.
Step 4: Click on the Connection Manager available tab on the left side.
Step 5: Under Connection Manager select the database name “CustomerDataWareHouse” and select source table name from dropdown which is listed as “dbo.SalesCDC”.
Step 6: Connect both component by dragging output of “ADO.NET Source” and drop as an input to “ADO.NET Destination”.
Step 7: Now configure ADO.NET Destination component by opening its Editor.
Step 8: Click on the Connection Manager available tab on the left side.
Step 9: Under Connection Manager select the database name “CustomerDataWareHouse” and from dropdown select destination table which is named as “dbo.SalesCDCDest”.
Once the destination table is selected just click on the “Mappings” tab to have a look whether source and destination is correctly mapped to each other then click on OK to close the Editor window of the ADO.NET Destination component.
Phase level marking:
Now click on the Control Flow where “Complete Data” i.e. Data Flow Task is present. Next is to do the markings till what level data is being loaded in SSIS when compared with SQL Server table which in this case is named a s “cdc.dbo_SalesCDC_DT” or even it can be termed synchronization is being completed. If this marking or tracking existing record is not done then duplication of data or records will be done again and again.
By marking start & end flag of Phase level, SSIS will come to know level of data is loaded or sync-up in “Complete Data”.
Practical of achieving marking start & end Phase level
Step 1: Drag and drop “CDC Control Task” from “Other Task” available under SSIS Toolbox.
Step 2: Rename this component to Start Flag Phase1.
Step 3: Double click the component to open its Editor window to configure it.
Step 4: Set “CDC control operation” to “Mark initial load start”.
Step 5: click on “New” button to create new variable to save the CDC state, here default name available is “CDC_state” will use the same.
Step 6: Next required is table to store state, to create that new table on database, for click new which will open new window.
Step 7: On this window will find a new query is already written to create table and its table name – “[dbo].[cdc_states]” is written. In order to execute query click “Run”.
Step 8: Click on OK to close the window of CDC Control Task Editor.
Step 9: Connect output of “Start Flag Phase1” component as input to “Complete Data” and connect output of “Complete Data” component as input to “End Flag Phase1”.
Step 10: Open “End Flag Phase1” Editor to configure with similar steps executed above.
Step 11: Here select from dropdown “Mark initial load end” as “CDC control operation”.
Step 12: Set the same variable “User::CDC_State” by selecting it from dropdown.
Step 13: To save the variable select the same table “[dbo].[cdc_states]” from dropdown which was created earlier.
Step 14: Click OK to close the window of CDC Control Task Editor of “End Flag Phase1” component.
Next go and check on the SQL Server under the database of “Object Explorer” with name “[dbo].[cdc_states]” table is successfully created, for that click on refresh to view new table created “[dbo].[cdc_states]”.
Now this table will hold the marking of finished synchronization written by SSIS.
All the things for Phase 1 is completed. Run the package to see whether things are executing fine for that click on start or simply do ctrl+F5 on the keyboard and see the execution result.
Following image shows that all packages have run successful with green mark at the top right side.
Moving further are results being inserted in the SQL Server table, starting with destination table of “dbo.SalesCDCDest” it will show the exact content of source table being copied in it.
Followed to that hen check CDC states table “dbo.cdc_states” here SSIS has done marking of start & end flag and these are can be seen under this table.
Configuring Incremental Updates – Phase 2
Before proceeding with the configuring of Phase 2 Incremental load firstly disable Phase 1 container as execution of it will be done once and that is already executed above.
Step 1: Do right click on Phase 1 container and click on disable.
Step 2: From “Other Task” under SSIS Toolbox drag and drop “CDC Control Task” component on “Phase 2 Incremental load” container.
Step 3: Drag another “CDC Control Task” component and drop it on “Phase 2 Incremental load” container.
Step 4: Drag “Data Flow Task” component and drop it between both “CDC Control Task” component.
Step 5: Rename all the three component.
First “CDC Control Task” component as “Start Flag Phase 2”.
Second “Data Flow Task” component as “Incremental Data”.
Third “CDC Control Task” component as “End Flag Phase 2”.
Step 6: Configure the CDC Control Task component by opening its Editor.
Step 7: Set CDC control operation as “Get processing range” by selecting it from the dropdown.
Step 8: Create a new variable by clicking on “New” which will open “Add New Variable” window, keep the default name “CDC_State” of then variable as New Variable name.
Step 9: Click on OK to close “Add New Variable” window.
Step 10: Select the table “[dbo].[cdc_states]” from the dropdown to store the variable.
Step 11: Click on OK to close Editor window.
Step 12: Connect output of “Start Flag Phase 2” component as input to “Incremental Data” and connect output of “Incremental Data” component as input to “End Flag Phase 2”.
Step 13: Open Editor of “End Flag Phase 2” component by double clicking the component.
Step 14: Set CDC control operation as “Mark processed range” by selecting it from the dropdown.
Step 15: Set variable containing CDC state, for that select it from dropdown as “User::CDC_State”.
Step 16: Select the table “[dbo].[cdc_states]” from the dropdown to store the variable.
Step 17: Click on OK to close Editor window.
Writing Incremental Data Logic
Double click the Incremental Data component which will open Data Flow pane to write logic into it.
Step 1: Now on the Data Flow pane drag and drop the CDC Source component from Other Source available under SSIS Toolbox.
Step 2: Next is to configure the same component by opening its Editor.
Step 3: Once the Editor is open click on the Connection Manager tab option which is available on the left side under this tab set the variables and tables of CDC.
Step 4: set CDC enabled table field as “[dbo].[SalesCDC]”.
Capture instance as “dbo_SalesCDC”.
CDC processing mode as “All”.
Variable containing the CDC state as as “User::CDC_State”
Please note: all of the field options are selected from the values available in the dropdown.
Step 5: Click on OK to close the CDC Source window.
Step 6: Drag and drop new component called “CDC Splitter” from “Other Transforms” available under SSIS Toolbox.
Step 7: Connect the output of “CDC Source” component as input to “CDC Splitter”.
Step 8: Drag and drop new component “ADO.NET Destination” from SSIS Toobox available under “Other Destination”.
Step 9: Connect output of “CDC Splitter” as input to “ADO.NET Destination” which will in turn open new “Input Output Selection” window.
Step 10: Select output field as “InsertOutput” from dropdown.
Step 11: Click OK to close the “Input Output Selection” window.
Step 12: Open Editor of ADO.NET Destination component by double clicking or do right click and click on Edit.
Step 13: Click Connection Manager to do the set destination table.
Step 14: Select ‘dbo’.’SalesCDCDesitnation’ available table option under the dropdown of this field.
Step 15: Finally before closing the Editor have final look on the mappings of input and destination columns, by clicking on Mappings.
Please Note: Also under one more point of noting is that it is taking values from the “dbo_SalesCDC_CT” table.
Since “Insert Output” is configured lets add a new record database called SalesMan à“T. P. Jain” and Amount à “2700”, see whether it is captured by “dbo_SalesCDC_CT” table.
For that run the container “Phase 2 Incremental load” and check the result in “dbo_SalesCDC_CT” table and the “SalesCDCDest” table.
Following image shows after running the container and then it will show the “dbo_SalesCDC_CT” table with operation marked with 2 as new record name available & destination table is also having new record inserted.
Update and Delete
In the following remaining article will next write steps for Update and Delete operation using OLEDB Command.
Starting with the steps of Update operation using OLE Command.
Step 1: Drag and drop the “OLE DB Command” component from “Common” available under SSIS Toolbox.
Step 2: Connect the output of the “CDC Splitter” component as input to the “OLE DB Command” component.
Step 3: As soon as the component are connected it will open Input Output Selection window under this select Output as “UpdateOutput”.
Step 4: Click OK to close the window.
Step 5: Rename “OLE DB Command” component as Update.
Step 6: Add new Connection Manager, by doing right click on the bottom where Connection Manager is available and click in New OLE DB Connection.
Step 7: Add New Connection Manager by clicking on “New” which will open new window screen.
Step 8: On this new window screen select Server name as “KHADAK-PC”.
Step 9: Select the database name as “CustomerDataWareHouse” from the dropdown option.
Step 10: Click on button available at the bottom to close each new window opened of Connection Manager.
Configure Update Component
Step 11: Double click the component to configure it.
Step 12: Once the Editor window is open click on the “Connection Manager” tab. Select the Connection Manager drop the dropdown which is newly created which is being named as “KHADAK-PC.CustomerDataWareHouse1”.
Step 13: Click on the next tab which is “Component Properties”.
Step 14: Now under this tab click on the field of SqlCommand which will open window to write the following Sqlcommand for update
Update [SalesCDCDest] set SalesMan=?,SalesAmount=?
Step 15: Click on the next tab which is “Column Mappings” here do the mappingsby setting
Input Column |
Destination Column |
SalesMan |
Param_0 |
SalesAmount |
Param_1 |
Update Component
Step 16: Drag and drop the “OLE DB Command” component from “Common” available under SSIS Toolbox.
Step 17: Connect the output of the “CDC Splitter” component as input to the “OLE DB Command” component. Rename the “OLE DB Command” component to “Delete”.
Step 18: Double click the component to configure it by opening its Editor. Once the Editor is open click on the “Connection Manager” tab.
Select the same connection manager “KHADAK-PC.CustomerDataWareHouse1” which was earlier selected for the update component configuration.
Step 19: Click on the next “Component Properties” tab and click on the field of SqlCommand to open the window to write the SqlCommand.
Step 20: Once the window is open write the SqlCommand for doing the delete operation and then click on OK to close the SqlCommand window & click OK to close Editor window.
Once both Update & Delete OLE DB component are configured. Next go and make changes as listed below in the source table which is [dbo].[SalesCDC] table.
Table Values |
Status |
Radhika – 2700 |
add as new record |
T.P. Jain – 2700 |
deleted record |
From “Sudarshan -2000” to “Sudarshani –
2000” |
Updated record |
Now run the Data Flow Task component which is renamed to “Incremental Data” by clicking on the Start button available on the top or do “ctrl+F5” on the keyboard.
After the package is successfully executed with green mark on the component and each output of CDC Splitter component has executed with 1 row for insert, 1 row for update and 1 row for delete.
Next is go and check the “dbo_SalesCDC_CT” table on the SQL Server where it has captured our made changes in the table as shown in the below image.
Finally also check whether the same changes are being available on the “SalesCDCDest” destination table.
For that run destination table by selecting the rows and will find that exact expected records are being displayed as shown in the below snapshot.
Hope that the CDC learning with practical execution and example is clearly understood.
Sharing with you one of the promising video from project series from MSBI(SSIS, SSAS and SSRS) hoping that you will like it