This article explores one of the features (of SQL server 2008 and higher) of providing the change tracking abilities i.e. what data has changed.
Introduction and background
Change tracking which means tracking the changes in data is an important need of modern day applications and especially the enterprise systems. Change tracking addresses the needs to query the changes to data and the access (security) information associated with the data changes, precisely the nature of change and who did it. The popular mechanism of implementing is by custom way i.e. addition of following columns in each of the database table
1. Added By :- The information about the user who added the record
2. AddedDateTime: The timestamp column for catching the exact datetime at which the record addition took place
3. UpdatedBy: The information about the user who updated i.e. edited and deleted the record
4. UpdatedDateTime: The timestamp column for catching the exact datetime at which the record updating took place
The other approached were creation of new table for storing the tracking information and using the combination of triggers and stored procedures or GUI component to put such information into these tables.
All these approaches have pros and cons but one thing is sure it requires a lot of work to be done and also it adds maintenance overhead.
The change tracking is characterized by
1. Change in row(record) or not: Whether the row has changed and recording of such change (achieved simultaneously or in the same transaction)
2. Which rows changed: which rows has changed and latest data of such changes or transaction rather than the intermediate values of multiple changes to same record/row
For recording the intermediate changes CDC i.e. change tracking feature is more useful. The information is available at http://www.dotnetfunda.com/articles/article1924-cdc-in-sql-server-2008-r2-part-i.aspx and http://www.dotnetfunda.com/articles/article1926-cdc-in-sql-server-2008-r2-part-ii.aspx.
The custom way of achieving such tracking involves lot of work and also it becomes maintenance overhead.
To address such issues, Microsoft has come up with more sophisticated and atomic approach called CT- Change Tracking. It provides ability to record all the changes caused be any DML statement that affects. This feature is particularly handy in one way or two way synchronization applications. Also this is built to work with ADO.NET synch services.
Objective
To learn about the CT feature of SQL server (version 2008 and higher) To learn about the CT feature of SQL server (version 2008 and higher) .
Description
Note: For the examples, the schema for Employee table from sample database i.e. AdvntureWorks in SQL 2008 R2 is used (after some changes in structure). For demonstration, the scripts are also attached.
Setting up CT has few administrative tasks to be performed on the database for who CT is to be implemented.
Let’s create a database called “TrialDB”.
USE master;
go
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'TrialDB')
DROP DATABASE TrialDB;
go
CREATE DATABASE TrialDB;
Go
Pre-requisites
1. The compatibility level of the database shall be 90 or more. The database compatibility level can be sensed as following
If the level is lesser than 90, the function CHANGETABLE needed for querying the change tracking information would return the error although the change tracking can be configured.
1. Enabling the SNAPSHOT isolation level helps to keep the change tracking information consistent.
--SETTING SNAPSHOT ISOALATION LEVELS
ALTER DATABASE TrialDB
SET READ_COMMITTED_SNAPSHOT ON;
GO
ALTER DATABASE TrialDB
SET ALLOW_SNAPSHOT_ISOLATION ON
GO
This can be done through SQL Server Management Studio by using the Database Properties (ChangeTracking Page) dialog box.
The next step is to enable the CT for this database.
USE TrialDB;
GO
Alter Database TrialDB
Set Change_Tracking = ON
(Auto_CleanUP=ON, CHANGE_RETENTION=3 Days)
CHANGE_RETENTION is meant to specify the time period for which the tacking information be kept. The information would be removed periodically after such period. The AUTO_CLEANUP option is meant for either enabling or disabling the automatic clean-up of old tracking information
When READ_COMMITTED_SNAPSHOT is enabled the read operations don’t block the update operations and all queries running under this isolation use row versioning.
Is it all done? Not yet, the CT is enabled for database but it has to be enabled for individual tables too.
Let’s create a table called “Employee”. This table originally belongs to database “AdventureWorks”.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [Employee](
[EmployeeID] [int] IDENTITY(1,1) NOT NULL,
[NationalIDNumber] [nvarchar](15) NOT NULL,
[ContactID] [int] NOT NULL,
[LoginID] [nvarchar](256) NULL,
[ManagerID] [int] NULL,
[Title] [nvarchar](50) NULL,
[BirthDate] [datetime] NULL,
[MaritalStatus] [nchar](1) NULL,
[Gender] [nchar](1) NULL,
[HireDate] [datetime] NULL,
[VacationHours] [smallint] NULL,
[SickLeaveHours] [smallint] NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NULL,
[ModifiedDate] [datetime] NULL,
CONSTRAINT [PK_Employee_EmployeeID] PRIMARY KEY CLUSTERED (
[EmployeeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Once the table is created, the CT can be enabled as shown below.
--ENABLING THE CHANGE TRACKING ON EMPLOYEE TABLE
ALTER TABLE Employee
ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON)
GO
Once the change tracking is enabled, we can verify whether the change tracking is indeed enabled or not and also know the table name for which it is enabled.
The SQL server creates and internal table for tracking with nomenclature “change_tracking_<table_object_id>”. Such table can’t be queried directly but SQL server exposes a set of functions to help retrieve the information about the changes.
Let’s try to understand what this query means for us
CHANGE_TRACKING_CURRENT_VERSION: This function is devised to return the current version number at the database level, possibly the highest number for change information
CHANGE_TRACKING_MIN_VALID_VERSION: This function returns the minimum version after the information for a table change has been retained or lowest number for a table change information
CHANGETABLE: This function is used to retrieve change information after version 0. Since we are yet to perform any DML operations after enabling Change Tracking, no records are returned.
Let’s perform some DML operations
INSERT: Following query would insert a record into Employee table
INSERT INTO [TrialDB].[DBO].[Employee]
([NationalIDNumber]
,[ContactID]
,[LoginID]
,[ManagerID]
,[Title]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[HireDate]
,[VacationHours]
,[SickLeaveHours]
)
VALUES
(14417810
,1212
,'trial\guy1'
,16
,'Production Technician - WC60'
,NULL
,'M'
,'M'
,GETDATE()
,21
,30
)
So this INSERT operation shall be captured and we can confirm this with following query.
SYS_CHANGE_OPERATION denotes the database operation
1: D: DELETE
2: I: INSERT and UPDATE
Let’s run the UPDATE query as following
UPDATE [TrialDB].[DBO].[Employee]
SET [NationalIDNumber]= 14417815
WHERE [NationalIDNumber]= 14417810
The change tracking details are
As seen, current version is 2 and change operation is “I” which is same for INSERT and UPDATE.
Retrieving all such DML changes is little tricky. It is as shown in the screenshot
It shows one record ahd change tracking version as 2. So it is essentially showing the latest version as the for INSERT and UPDATE , it is treated as INSERT.
DELETE: We would delete the first record and see how it is captured.
DELETE FROM [TrialDB].[DBO].[Employee]
WHERE [EmployeeID]=3
Following are the results for change information and changes.
This shows the latest change i.e. deletion and as the columns referred in select list as shown as NULL because it is deleted.
The last but not the least is to know which columns changed. This operation is also tricky in a sense that it needs a join between SQL server internal table and the table for which the changes are tracked.
DECLARE @PreviousVersion bigint = 0
SELECT CTTable.EmployeeID, CTTable.SYS_CHANGE_OPERATION, Emp.NationalIDNumber, Emp.ContactID, Emp.Gender,
[ContactIDChanged?] = CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID('Employee'),
'ContactID', 'ColumnId'), SYS_CHANGE_COLUMNS), [NationalIDNumberChanged?] = CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(
OBJECT_ID('Employee'),'NationalIDNumber', 'ColumnId')
, SYS_CHANGE_COLUMNS),
[Gender?] = CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID(
'Employee'),'Gender', 'ColumnId')
, SYS_CHANGE_COLUMNS)
FROM CHANGETABLE (CHANGES Employee, @PreviousVersion) AS CTTable
LEFT OUTER JOIN Employee AS Emp
ON emp.EmployeeID = CTTable.EmployeeID
Following screenshot shows the query and results both.
The change information for column like ‘'NationalIDNumber'’, ‘'ContactID'’ and ‘'Gender'’ is asked for and of course after deletion these columns have changed.
Summary and Conclusion
The change tracking in SQL server is better alternative for custom solutions for tracking the database changes and also can be really useful in synchronization/ replication scenarios. The retrieval of change information is little tricky and needs some work of joining the internal tables and the user table for which the change tracking is enabled.
We saw a practical example and also went through database operations and the capture of changes.
This looks like very promising feature in case where information about the last changes can matter and it would be easy to choose for at least new project. For existing systems, there is little one time overhead for setting up the CT but it would be worth vis-à-vis custom and cumbersome solutions.
HAPPY PROGRAMMING!!!
Reference
http://msdn.microsoft.com/en-us/library/bb933875.aspx
http://msdn.microsoft.com/en-us/library/bb964713.aspx
http://msdn.microsoft.com/en-us/library/bb933874.aspx
http://msdn.microsoft.com/en-us/library/bb933875.aspx
http://msdn.microsoft.com/en-us/library/bb964713.aspx
http://msdn.microsoft.com/en-us/library/bb933874.aspx