This article (Part II) discusses the intrinsic of CDC as one of the feature (of SQL server 2008 and higher) for providing the data auditing abilities
Introduction
In the previous article (http://www.dotnetfunda.com/articles/article1924-cdc-in-sql-server-2008-r2-part-i.aspx), we discussed about
auditing needs in general and solutions for data auditing. Also we saw how to
implement CDC and consequent results.
In this article, we would dive deep to know
what the architecture of CDC and how it is structured in the SQL server 2008.
Objective
To learn
about the intrinsic of CDC feature of SQL server (version 2008 and higher) To learn
about the intrinsic of CDC feature of SQL server (version 2008 and higher)
Description
Note: SQL
2008 R2 is referenced and also couple of MSDN articles, image/s are referred for
this write-up.
The diagram which depicts the data flow for
CDC is as below
(Image
taken from http://msdn.microsoft.com/en-us/library/cc645937%28v=sql.105%29.aspx)

As seen in the diagram, the data capture
process captures the data from transaction logs resulting better performance by
avoiding overload on DML operations and transactions. The snapshot for captured
changed data is retrieved through the query functions (as shown above) which
are used for data warehouse.
CDC database tables
After reading the transaction logs, it
populates the change tables. This change tables are mentioned in the Part I
still we would revisit them.
[cdc].[captured_columns]
[cdc].[change_tables]
[cdc].[ddl_history]
[cdc].[index_columns]
[cdc].[lsn_time_mapping]
[cdc].[<capture_instance>_CT]
Apart from the last one, these are kind of
system table for CDC and actual data would be captured in separate tables which
would correspond to the database tables for which CDC is to be enabled (as
indicated last in the list)
CDC Database Stored Procedures
Following stored procedures are created in
database as a part of enabling the CDC for particular database.
sys.sp_cdc_add_job (Transact-SQL)
sys.sp_cdc_generate_wrapper_function (Transact-SQL)
sys.sp_cdc_change_job (Transact-SQL)
sys.sp_cdc_get_captured_columns
(Transact-SQL)
sys.sp_cdc_cleanup_change_table
(Transact-SQL)
sys.sp_cdc_get_ddl_history (Transact-SQL)
sys.sp_cdc_disable_db (Transact-SQL)
sys.sp_cdc_help_change_data_capture
(Transact-SQL)
sys.sp_cdc_disable_table (Transact-SQL)
sys.sp_cdc_help_jobs (Transact-SQL)
sys.sp_cdc_drop_job (Transact-SQL)
sys.sp_cdc_scan (Transact-SQL)
sys.sp_cdc_enable_db (Transact-SQL)
sys.sp_cdc_start_job (Transact-SQL)
sys.sp_cdc_enable_table (Transact-SQL)
sys.sp_cdc_stop_job (Transact-SQL)
These are employed for
1.
Configure CDC
2.
Manage (i.e. create and run)
the CDC agent jobs
3.
Provide the metadata to CDC
consumers i.e. the one who would be utilizing CDC which could be a data
warehouse of simple system for display history
CDC Database Functions
Following functions are created to retrieve
the information about the changes which are captured as part of CDC.
sys.fn_cdc_has_column_changed
(Transact-SQL)
sys.fn_cdc_increment_lsn (Transact-SQL)
sys.fn_cdc_decrement_lsn (Transact-SQL)
sys.fn_cdc_is_bit_set (Transact-SQL)
sys.fn_cdc_get_column_ordinal
(Transact-SQL)
sys.fn_cdc_map_lsn_to_time (Transact-SQL)
sys.fn_cdc_get_max_lsn (Transact-SQL)
sys.fn_cdc_map_time_to_lsn (Transact-SQL)
sys.fn_cdc_get_min_lsn (Transact-SQL)
cdc.fn_cdc_get_net_changes_<capture_instance>
(Transact-SQL)
cdc.fn_cdc_get_all_changes_<capture_instance>
(Transact-SQL)
The last 2 functions returns the rows for
change applied for data. Other functions are administrative and related to meta
data.
SQL Server Agent Jobs
There are 2 SQL server agent jobs for CDC
1.
To populate the change tables à Capture Job
2.
To clean-up the change tablesà Clean-up job
These jobs
are created through stored procedures once a database table is enabled for CDC.
The “Capture Job” starts immediately and removed when CDC is disabled for
database. The creation and removal can be controlled by administrators.
Limitations of CDC
CDC doesn’t support the capturing of
changes in case of computed columns, usage of column-set in case of sparse
columns. For timestamp data type the data is converted into binary.
Querying and meta data
Let’s check out for the columns in the
tables meant for captured data.
(i)
__$operation denotes the database operation
1: DELETE 2:
INSERT 3: Before UPDATE 4: After UPDATE
(ii)
__$start_lsn
: It’s a Log Sequence Number (LSN) of
data type binary(10) and is common in one transaction.
(iii)
__$end_lsn
: Meant for informational purpose
(iv)
__$seqval
: Represents a sequence value for order (sequence) in a transaction
(v)
__$update_mask
: Serves as a bit mask for identifying which columns from change tables are
changed.
Conclusion
The data auditing through CDC is really
useful feature. In the part I of this series we saw practical example. We
delved deep in CDC intrinsic and touch based the data flow overview, the stored
procedures, agent jobs and querying etc. in this part.
Definitely there is a bit of administrative
and monitoring overhead involved in CDC but it is taken care by the atomic,
unified, configurable execution of capturing the changed data.
Hope these articles have helped to
understand the minimum fundamental aspects of CDC for actual implementation and
also implementation steps.
HAPPY PROGRAMMING!!!
Reference
http://msdn.microsoft.com/en-us/library/bb522489(v=sql.105).aspx
http://msdn.microsoft.com/en-us/library/cc645858.aspx
If you like this article, subscribe to our
RSS Feed. You can also subscribe via email to our Interview Questions, Codes and Forums section.