Archive

Archive for December, 2011

HASHBYTES() for lookup

December 29, 2011 Leave a comment

We have many options to lookup target Junk DIM to retrieve the SID while in process the Datamart FACT tables.
I would like to do this process using HASHBYTES() as

-- Source table data with prepared HASHBYTES 
SELECT TOP 10
        CITY,
        StateProvince,
        CITY + StateProvince AS [ForHash],
        hashbytes('SHA1', CITY + StateProvince) AS [SourceHashBytes]
FROM    SalesLT.Address
-- Target table data with prepared HASHBYTES
SELECT  CityID,
        City,
        [State],
        hashbytes('SHA1', CITY + [State]) AS [TargetHashBytes]
FROM    BITEST.dbo.TargetJunkDim
-- Lookup to Target JunkDIM to retrieve the SID 
SELECT TOP 10
        [Target].CityID,
        [Source].CITY,
        [Source].StateProvince
FROM    SalesLT.Address AS [Source]
        LEFT JOIN BITEST.dbo.TargetJunkDim AS [Target] ON hashbytes('SHA1', [Source].CITY + [Source].[StateProvince]) = hashbytes('SHA1', [Target].CITY + [Target].[State])


 
Points to consider
 
1. Haven’t discussed the concept of Junk DIM here as HASHBYTES() is the main concern this post besides writing lookup script by comparing individual columns using either at ETL LOOKUP or SQL LEFT JOIN
2. Haven’t discussed the Junk DIM concept here but the above scenario is well processed to accommodate the logical flow of Junk DIM
3. Have used SQL LEFT JOIN as it is equality concept of ETL Lookup process

Categories: SSIS, TSQL Tags: