Good day everybody,
This post won’t cover how a T-SQL MERGE STATEMENT works, instead we are going to talk about a specific real life scenario. How do we subset the TARGET in a T-SQL MERGE STATEMENT?
A MERGE STATEMENT is very commonly used to apply a CDC (Change Data Capture) process, as part of this process we may find tables with millions of records that for obvious reasons we want to load them incrementally into our destination table using any kind of data processing tool (I use ETL-SSIS) and a Stored Procedure to MERGE these records (CDC Process).
So far so good, it looks like a normal scenario, assuming that we have a last updated date audit column in our Source table that we can easily use and a created date, we can apply filters to our source query and incrementally load records. Now, I’d like to add that there’s a specific requirement that changed the way we do the MERGE STATEMENT, for various reasons (i.e. audit columns not getting updated when the attributes change) we need to incrementally load the data for a window of 40 days, something like this:
SELECT Column1,Column2,.. FROM SourceTable WHERE PrimaryKeyDate> GETDATE()-40
Then, we load these records into a staging table. We use the Staging table to apply a CDC process against our destination table. In this case a MERGE STATEMENT with the following structure will handle the :Inserts,Updates,Deletes.
MERGE INTO TargetTable AS TARGET USING (SELECT Column1, Column2... FROM SourceTable) AS SOURCE ON TARGET.PrimaryKey=SOURCE.PrimaryKey AND TARGET.PrimaryKeyDate=SOURCE.PrimaryKeyDate --In this case, one of the date fields is a Primary Key WHEN NOT MATCHED BY TARGET THEN INSERT (Column1, Column2...) VALUES (SOURCE.Column1, SOURCE.Column2, .... ) WHEN MATCHED AND (TARGET.Hash_Column<>SOURCE.Hash_Column) THEN UPDATE SET TARGET.Column1=SOURCE.Column1 ,TARGET.Column2=SOURCE.Column2 ,.... WHEN NOT MATCHED BY SOURCE THEN UPDATE SET TARGET.Flag='Deleted'
The issue appears when the MERGE STATEMENT compares the SOURCE (With 40 days of data) against the Target (ALL previously loaded records), because of the way the DELETE works, the MERGE statement will delete (in my example a soft delete) all the records from the target that are not part of the 40 Day window.
So, a quick fix for this was to COMPARE apples to apples, in other words if I’m comparing 40 days of data, I need the target to subset the same 40 days of data. To subset the TARGET in a Merge statement you have to use a CTE (Common Table Expression) , this is how I did it…
;WITH CTEtargetSubset AS (SELECT * FROM TargetTable WHERE PrimaryKeyDate>GETDATE()-40) MERGE INTO CTEtargetSubset AS TARGET USING....--exactly the same code explained above
And it worked perfectly, what do you think? I’m pretty sure there are many workarounds and alternatives, if you have better suggestions please comment below.