Posts Tagged ‘MergeJoin Performance’

SSIS – MERGE JOIN pass through of Sorting using XML to improve performance

November 4, 2012 1 comment

In some scenarios, we always end-up by using SSIS MERGE JOIN component instead of LOOKUP component while joining multiple data sources to get multiple matching records on matching business condition due to the limitation of SSIS LOOKUP component. However as we know MERGE JOIN requires sorting source while not having advanced properties by design.
Here planning to discuss a scenario where we required various sorting at each source join and an update at MERGER JOIN XML code to avoid SORTING components to improve performance.

Step 1: Two data sources with pre-existing sorting are joining using MERGE JOIN, here we are joining based on only one key though we have 2 sorted keys from BusinessPriority1 source as

Step 2: The result of two sources (output of MRGJ Sources 1) needs to join another source BusinessKey3 based on 2 keys. MERGE JOIN is not pass through of existing sorting of 2 keys from original source i.e. MERGE JOIN is limiting the sorting based on join keys

Step 3: In order to maintain the join condition of sources, we have used SORT component to sort the source on 2 business keys as

Step 4: We can join the sources on required 2 business keys as

Step 5: On execution of the package, data load has completed in 0.516 sec, working with very small datasets

Step 6: Unlike OLE DB Source, MERGE JOIN don’t have any advanced settings option to set IsSort and SortKeyPosition options

Step 7: Go to XML code of the package using view code option as

Step 8: MERGE JOIN is maintaining only one sortKeyPosition while not passing of pre-existing sorting from original source

Step 9: Just add the code at output columns i.e. added sortKeyPosition = “2” at output column to make the sort order on MERGE JOIN columns and save the changes

Step 10: SSIS designer picks-up the XML changes and gives the warning on SORT of existing columns so we can now remove the SORT component

Step 11: We can get the MERGE JOIN Order of columns directly from output of MERGE JOIN

Step 12: On execution, package has successfully executed in 0.391 sec, which is lot better in performance when compared to usage of SORT component

Points to consider :

1. Have used SSIS 2012 SSDT for this demo but we can also implement the same approach at SSIS 2008 BIDS.
2. Used a small dataset for this demo but I implemented this approach at production to replace many of SORT component and achieved more than 40 per cent of load performance.
3. Updating of existing SortKeys order at XML code may fail the component so use to extend the sort functionality of other columns rather than changing the existing sort order