How to do a change capture for a Slowly Changing Dimension

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

How to do a change capture for a Slowly Changing Dimension

Post by pavi »

Hello Team,

Can any one please resolve this issue?The scenario is

After:

Key,Value
ABC,10
ABC,10

Before:
Key,Value
ABC,20

O/P:

Key,Value,Change code
ABC,10,Update
ABC,10,Insert


Expectation is both after records are updates.

Am I missing any thing
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Your missing the fact that the change capture stage requires unique keys on both of its inputs.

Mike
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

Thanks Mike for your response, But it didn't work.

If I understood well, the modified scenario is
Today:

Key,Value ,QTY
ABC,10,1
ABC,10 ,2

Yesterday:
Key,Value
ABC,20

O/P:

Key,Value,Change code,QTY
ABC,10,Update ,1
ABC,10,Insert,2
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

You didn't understand... How many rows with a key value of "ABC" do you have in your today input? If the answer is not one, then the change capture stage won't produce expected output.

If you have a scenario where yesterday is a "master" file and today contains multiple update transactions, the Merge stage might supply functionality you can use.

Mike
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

Thanks alot Mike for quick response.

Our source is a transaction DB where I can get duplicate keys with same value and the fact information might vary based on quantity of the products bought by customers. We are using change capture to see if there is an update of insert. I believe Merge stage doesn't tell whether a record is an update or delete. So I am thinking to do remove duplicates on Key before applying change capture.

DB---> RemDup---> Change capture--> apply Type2 logic---> insert/updateDB

We know from requirements that we will get dups from source with same value but the fact Qty might change.

Am I on the right direction? Please suggest
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

You'll need to think through your requirements...

Do you need to record every transaction?
Do you only need to record the last transaction?
How do you identify the last transaction?
etc.

Mike
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

Mike we record every transaction. So if we have 3 transactions which occurred for a product we are pulling all three. One of the subset requirements is if these three transactions have same key then it will be either an update or an insert. We need to push three records into fact and 1 record as an update into dimension. As it is a Type 2, we need to insert one record in a dimension and update the previous one.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Well, if you need to record all transactions, then removing duplicates is not going to be a good idea.

The SCD Stage handles multiple transactions and was made for maintaining slowly changing dimensions.

Mike
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

Thanks Mike for quick response!!

Out of my curiosity how to implement this scenario in DataStage 7 where we don't have SCD stage?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your original post specified 8x, which is probably why Mike responded as he did.

In earlier versions you have to do all the coding yourself; the lookup on the business key, the testing to see whether the row exists and, if so, whether there are any changes, the generation of the new key value and the handling of the SCD2 dimension (SCD1 is a simple update).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

pavi wrote:Out of my curiosity ...
Since it was just your curiosity, I didn't feel the urge to describe it for a DataStage version that is out of support.

Ray's description of the old design pattern makes it obvious just how convenient the new SCD stage is.

One thing to add... since you have multiple change transactions coming on the stream input, the testing for type 2 changes becomes a little more complex... you need to check both the lookup results and the previous input record to determine changes.

Of course, if you'd really like to satisfy your curiosity, you can still create the 7X design pattern in an 8X environment.

Mike
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:In earlier versions you have to do all the coding yourself
It's not really all that hard. You dang kids with your fancy smancy SCD stage, get off my lawn! :P
-craig

"You can never have too many knives" -- Logan Nine Fingers
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

Thanks a lot Mike, Ray,Chullet..

Even though SCD is a robust stage to handle this scenorio.I implemented it in the way we implement in 7x as it shows me whats going on rather than bundling it in single stage.

Thanks a lot again for you time and response.
Post Reply