Good day everybody,
This post won’t cover how a T-SQL MERGE STATEMENT works, instead we are going to talk about a specific real life scenario, the question I’ll try to answer is How to subset the TARGET in a T-SQL MERGE STATEMENT?
A MERGE STATEMENT is very commonly used to apply a CDC (Change Data Capture) process, as part of this process we may find tables with millions of records that for obvious reasons we want to load them in an incremental load. To do an incremental load I’m doing the following, use a data processing tool (I use ETL-SSIS), dynamically filtering the source query and a Stored Procedure to MERGE these records (CDC Process).
In a perfect World, my audit columns (created_date, last_updated_date) exist in my source table and are working as expected. Now, we surely know that real life is tougher, as you may know already, the audit columns are not getting updated when the attributes change, so that changes the way we do the MERGE STATEMENT, now we’ll need to incrementally load the data for a 40 days window, something like this:
SELECT Column1,Column2,.. FROM SourceTable WHERE PrimaryKeyDate> GETDATE()-40
Then, we load these records into a staging table. We use the Staging table to apply a CDC process against our destination table. In this case a MERGE STATEMENT with the following structure will handle the :Inserts,Updates,Deletes.
MERGE INTO TargetTable AS TARGET USING (SELECT Column1, Column2... FROM SourceTable) AS SOURCE ON TARGET.PrimaryKey=SOURCE.PrimaryKey AND TARGET.PrimaryKeyDate=SOURCE.PrimaryKeyDate --In this case, one of the date fields is a Primary Key WHEN NOT MATCHED BY TARGET THEN INSERT (Column1, Column2...) VALUES (SOURCE.Column1, SOURCE.Column2, .... ) WHEN MATCHED AND (TARGET.Hash_Column<>SOURCE.Hash_Column) THEN UPDATE SET TARGET.Column1=SOURCE.Column1 ,TARGET.Column2=SOURCE.Column2 ,.... WHEN NOT MATCHED BY SOURCE THEN UPDATE SET TARGET.Flag='Deleted'
The issue appears when the MERGE STATEMENT compares the SOURCE (With 40 days of data) against the Target (ALL previously loaded records), because of the way the DELETE works, the MERGE statement will delete (in my example a soft delete) all the records from the target that are not part of the 40 Day window.
So, a quick fix for this was to COMPARE apples to apples, in other words if I’m comparing 40 days of data, I need the target to subset the same 40 days of data. To subset the TARGET in a Merge statement you have to use a CTE (Common Table Expression) , this is how I did it…
;WITH CTEtargetSubset AS (SELECT * FROM TargetTable WHERE PrimaryKeyDate>GETDATE()-40) MERGE INTO CTEtargetSubset AS TARGET USING....--exactly the same code explained above
And it worked perfectly, what do you think? I’m pretty sure there are many workarounds and alternatives, if you have better suggestions please comment below.