Cartesian/Cross Join

I was once asked in a job interview if I knew what a Cartesian/Cross Join was.

A more interesting question is: How can a Cross Join be useful?

There are actually many good reasons to use a Cross Join. Outlined below are a couple the most common scenarios that I have used in Data Warehousing:

1. Junk Dimension

In addition to the advantages listed in the Kimball article, Junk Dimensions can greatly reduce the overhead of processing a large dimension. Consider the following real life scenario:

An existing type 1 slowly changing customer dimension with 10 + million members, needs  to have a set of Flag/Boolean attributes (one for each day of the week) added to it. and just make things worse these attributes need to be type 2. Do you change the already large dimension to SCD 2 on attributes that change might frequently? NO!

Solution: Create a Junk Dimension with all possible combinations of the Flags/Boolean indicators. see screen shots:

–create table to hold weekday values
declare @WeekFlag table
(Sunday bit,
Monday bit,
Tuesday bit,
Wednesday bit,
Thrusday bit,
Friday bit,
Saturday bit)
–insert boolean values for the each weekday
insert into @WeekFlag values( 0,0,0,0,0,0,0)
insert into @WeekFlag values( 1,1,1,1,1,1,1)
–look at table
select * from @WeekFlag
— self join table by using cross join
select a.Sunday,b.Monday,c.Tuesday,d.Wednesday,e.Thrusday,f.Friday,g.Saturday
from (select Sunday from @WeekFlag) as  a
cross join (select Monday from @WeekFlag) as  b
cross join (select Tuesday from @WeekFlag) as  c
cross join (select Wednesday from @WeekFlag) as  d
cross join (select Thrusday from @WeekFlag) as  e
cross join (select Friday from @WeekFlag) as  f
cross join (select Saturday from @WeekFlag) as  g

image

2. Time Dimension

“Time Dimension” here does not mean a Time Dimension with the granularity of a day (like AdventureWorks) but a Dimension with a granularity LESS THAN A DAY. Example: analyze transactions down to the millisecond.

Don’t try to combine a Dimension with a granularity of a day with one of a millisecond you will end up with billions or more members!!

Solution:

Cross Join each level of the Time Dimension.

Milliseconds –> Seconds –> Minutes –> Hours

here is an example of the first cross join in the series:

declare @MilliSecTable table ( MilliSecValue int)
declare @Msec as int = 0
    insert into @MilliSecTable values (@Msec)
while (select MAX(MilliSecValue) from @MilliSecTable) < 999
begin
    insert into @MilliSecTable values((select MAX(MilliSecValue) from @MilliSecTable) + 1)
end
—————————————————————————————–
declare @SecTable table ( SecValue int)
declare @Sec as int = 0
    insert into @SecTable values (@Sec)
while (select MAX(SecValue) from @SecTable) < 59
begin
    insert into @SecTable values((select MAX(SecValue) from @SecTable) + 1)
end

select SecValue,MilliSecValue from @SecTable cross join @MilliSecTable

image

**Note: The above code also performs a lot better than simply each incrementing each member by 1 millisecond. That could literally take days 😉

Resources:

The Many-to-Many Revolution: This takes the basic idea of T-SQL Cross Joins and applies them to SSAS. It is a great read!

Advertisements

Tags: ,

2 Responses to “Cartesian/Cross Join”

  1. Sissi Says:

    Hi
    I am very new with MDS and SSIS, and I am going to work with both of them. But I would like to know how is MDS and SSIS works together? What is the walkflow to start with?For instance, should I start with MDS, export excel, then use it as data source in SSIS or? I do not know where to start with. Could you give me a brief intro? Thanks.

    • Garrett Edmondson Says:

      Good question unfortunately the answer to most of them is that it depends on the requirements 🙂 But to begin with MDS should only be used for dimensional type data and no transactional data. SSIS can be used to import dimensional type data and sync it back to the source systems or let it continue to a data warehouse from MDS. But not all the data is always available in a source system so then excel with MDS can be used to input or manage data that is not present or has not been defined correctly in any source systems. Read Microsoft SQL Server 2012 Master Data Services 2/E

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: