Archive

Posts Tagged ‘SQL – CDC’

SQL – Change Data Capture

September 3, 2011 Leave a comment

Change data capture is a SQL feature to capture any changes to data in tables/columns. This feature is mainly useful to capture changes in a database while doing auditing.
Implementing the CDC using BDD (Behavioural Driven Development) methodology with the following four cases
1. Enabling CDC
2. Capturing changes-Insert/Update/Delete
3. Analysis of data from CDC
4. Disabling CDC
 
Here considering AdventureWorks Database for CDC implementation
 
BDD Case 1: Enabling CDC
 
Step 1: Create two source tables with foreign key relationship as

USE [AdventureWorks]
GO

CREATE TABLE [dbo].[OrderHeader](
	[SalesOrderID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
	[RevisionNumber] [tinyint] NOT NULL,
	[OrderDate] [datetime] NOT NULL DEFAULT GETDATE(),
	[DueDate] [datetime] NOT NULL DEFAULT GETDATE(),	
	[ModifiedDate] [datetime] NOT NULL DEFAULT GETDATE()
 CONSTRAINT [PK_OrderHeader_SalesOrderID] PRIMARY KEY CLUSTERED 
(
	[SalesOrderID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[OrderDetail](
	[SalesOrderID] [int] NOT NULL,
	[SalesOrderDetailID] [int] IDENTITY(1,1) NOT NULL,	
	[ModifiedDate] [datetime] NOT NULL DEFAULT GETDATE(),
 CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED 
(
	[SalesOrderID] ASC,
	[SalesOrderDetailID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

USE [AdventureWorks]
GO

ALTER TABLE [dbo].[OrderDetail]  WITH CHECK ADD  CONSTRAINT [FK_OrderDetail_OrderHeader_SalesOrderID] FOREIGN KEY([SalesOrderID])
REFERENCES [dbo].[OrderHeader] ([SalesOrderID])
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[OrderDetail] CHECK CONSTRAINT [FK_OrderDetail_OrderHeader_SalesOrderID]
GO

 
Step 2: Check the record count in the source tables

 
-- Record Count 
USE AdventureWorks
SELECT COUNT(*) AS [OrderHeaderCount]
 FROM [dbo].[OrderHeader]

SELECT COUNT(*) AS [OrderDetailCount]
 FROM [dbo].[OrderDetail]

Source record count

Source record count


 
Step 3 : Check whether CDC is enabled for Database

--To check whether CDC is enabled for database. 
USE master
GO
SELECT [name], database_id, is_cdc_enabled 
FROM sys.databases
WHERE [name] = 'AdventureWorks' --Take of WHERE clause to check for all the DBs' on server     
GO 

is_cdc_enabled value denotes as
0 — not enabled
1 — enabled

CDC - status

CDC - status


 
Step 4: Enable CDC at Database level as

-- Enable CDC in AdventureWorks database.
USE AdventureWorks
GO
EXEC sys.sp_cdc_enable_db
GO

CDC @database

CDC @database


 
Step 5: Check the CDC enable status at Database level

--To check whether CDC is enabled for database. 
USE master
GO
SELECT [name], database_id, is_cdc_enabled 
FROM sys.databases
WHERE [name] = 'AdventureWorks' --Take of WHERE clause to check for all the DBs' on server     
GO 

CDC @database

CDC @database


 
Step 6: Check any tables in database have already enabled for CDC as

--To check tables of database have already been enabled for CDC.
USE AdventureWorks
GO
SELECT [name], is_tracked_by_cdc 
FROM sys.tables
GO  

CDC @tables

CDC @tables


 
Step 7: Check for any tables in system tables
 
Check system tables

Check system tables


 
Step 8: Enable CDC for 2 source tables as

--The script to enable CDC on dbo.OrderHeader table, i.e. enabling on all the columns of table
USE AdventureWorks
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'OrderHeader',
@role_name     = NULL  
--Permission to access the CDC tables to particular table
--here passing NULL means everyone could get the access to CDC table data
--@captured_column_list = '[RevisionNumber],[OrderDate]' To enable onlyparticular columns in the table 
GO

On enabling CDC on first table, it will create two SQL Agent Jobs, CDC requires SQL Server Agent services on the server.

CDC @table and  SQL Agent jobs

CDC @table and SQL Agent jobs


CDC @table

CDC @table


 
Step 9: Enabling CDC will create system tables to hold the tracking data on data change at source tables
CDC system tables

CDC system tables


 
BDD Case 2: Capturing changes Insert/Update/Delete
 
Step 1: Check for any existing records in the source and cdc system tables

--Check for any existing records. 
USE AdventureWorks
GO
SELECT *
 FROM dbo.OrderHeader
GO
USE AdventureWorks
GO
SELECT *
FROM dbo.OrderDetail
GO
USE AdventureWorks
GO
SELECT *
 FROM cdc.dbo_OrderHeader_CT
GO
USE AdventureWorks
GO
SELECT *
FROM cdc.dbo_OrderDetail_CT
GO

Check source tables data

Check source tables data


Check source tables data

Check source tables data


 
Step 2: CDC – insert and check CDC tables
Insert one record in each table in a single transaction, here using single transaction to for both tables insertion to check how CDC is giving the change status of records

--Insert a new record in user tables in a single transaction. 
USE AdventureWorks
GO
INSERT INTO [dbo].[OrderHeader]
           ([RevisionNumber])
     VALUES
           (1)
INSERT INTO [AdventureWorks].[dbo].[OrderDetail]
           ([SalesOrderID])
     VALUES
           (1)
GO

Insert records

Insert records


Check for CDC tables as using source query
CDC - Insert

CDC - Insert


 
Notes:
1. Insertion has captured and recorded in the CDC tables
2. _$start_lsn (Last Sequence Number) is different for both records even though insertion in a single transaction so CDC is making records capture individually but not as a group.
3. _$operation value is 2 (which is for Insertion capture of CDC)
&nbsp
Step 3: CDC – Update and check CDC tables
Updating the existing records in source tables as

--Update Records in user tables in a single transaction.
USE AdventureWorks
GO
UPDATE [dbo].[OrderHeader]
SET ModifiedDate = GETDATE()
UPDATE [dbo].[OrderDetail]
SET ModifiedDate = GETDATE()
GO

Update source records

Update source records


 
Check for CDC tables using source query
CDC - Update

CDC - Update


 
Notes:
1. Updates has captured and recorded in the CDC tables
2. _$start_lsn (Last Sequence Number) is different for both records even though updates in a single transaction so CDC is making records capture individually but not as a group.
3. _$operation values are 3,4 (these are for updates capture of CDC)
Here 3, is for the record before the update
4, is for the record after the update
 
Step 4: CDC – Delete and check CDC tables
 
Delete the existing records in source tables as

--Delete Records in user tables in a single transaction.

USE AdventureWorks
GO
DELETE FROM  [dbo].[OrderDetail]
DELETE FROM  [dbo].[OrderHeader]
GO 

Delete source records

Delete source records


Check for CDC tables using source query
CDC - Delete

CDC - Delete


 
Notes:
1. Deletion has captured and recorded in the CDC tables
2. _$start_lsn (Last Sequence Number) is different for both records even though update in a single transaction so CDC is making records capture individually but not as a group.
3. _$operation value is 1 (which is for Delete capture of CDC)
 
BDD Case 3: Analysis of data from CDC
 
Analysis of data from CDC tables is required to get the changes in the source tables.
 
Step 1: CDC will create the functions as
CDC - functions

CDC - functions


 
The script will give the changes in the database from yesterday and could also use other functions according to the requirement to get the data from CDC tables

USE AdventureWorks 
GO 
DECLARE @start_time DATETIME, @end_time DATETIME, @start_lsn BINARY(10), @end_lsn BINARY(10); 
SELECT @start_time = GETDATE()-1, @end_time = GETDATE(); 
SELECT @start_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @start_time); 
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time); 
SELECT * 
FROM cdc.fn_cdc_get_all_changes_dbo_OrderHeader(@start_lsn,@end_lsn,'all') 
GO

CDC data

CDC data


 
BDD Case 4: Disabling CDC
 
Disabling the CDC is a two-step process as equal to enabling process
 
Step 1: Disabling CDC on tables

--Disabling Change Data Capture on a table
USE AdventureWorks;
GO
EXECUTE sys.sp_cdc_disable_table
    @source_schema = N'dbo',
    @source_name = N'OrderHeader',
    @capture_instance = N'dbo_OrderHeader';
GO
EXECUTE sys.sp_cdc_disable_table
    @source_schema = N'dbo',
    @source_name = N'OrderDetail',
    @capture_instance = N'dbo_OrderDetail';
GO

CDC - Disable @table

CDC - Disable @table


 
Step 2: Disabling CDC on Database

--Disable Change Data Capture on Database
USE AdventureWorks
GO
EXEC sys.sp_cdc_disable_db
GO 

CDC - Disable @database

CDC - Disable @database


 
On disabling the CDC on tables and Database, all the CDC tables and functions will be removed from the database along with CDC SQL Agent jobs.
 
Points to consider:
 
1. CDC capture the changes in database individual record level but as a group of records changes so getting the changes in table with a group by transaction is not possible as here didn’t find the changes using single transaction DML operations.
2. CDC performance will go down if we are using any bulk operations as it needs to capture changes besides regular bulk operation
3. CDC will not accept Truncation on table

Categories: SQL Server Tags: