Duplicate Key values in CDC Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
phanikumar
Participant
Posts: 60
Joined: Tue Sep 20, 2011 10:44 pm
Location: INDIA

Duplicate Key values in CDC Stage

Post by phanikumar »

Hi All,

Can CDC stage handle duplicates.. We have a scenario where duplicate values coming in for KEY column.. The job produce different results upon running multiple times..

The job does join on KEY column and produce multiple records for the same key column.. But the change codes are not consistent when run multiple times..

The KEY column is HASH partitioned and sorted from both the links..

Can any one please help me understand why would it produce different resluts..

Regards
Kumar
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

Post by rbk »

Not sure but this is a scenario that I have noticed as well.

I have noticed in cases where we have duplicates in the source (after), the first record gets identified as a copy (assuming the data is available in the reference/before as well) and the second record gets identified as an insert. Not sure why it does that. Would be nice to get an understanding of how exactly the CDC stage works. Also I think it is better to not have duplicates in the source and reference considering that we are trying to identify the changes. Do let us know if you come across any solutions...
Cheers,
RBK
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

I'm not sure if this is documented or if it is just something I know from experience.

The change capture stage requires unique keys on its inputs.

This makes perfect sense if you think about the classic two file match logic that probably happens under the covers where a key match results in the next record from each file being read before the next key comparison.

Having said that... it is still possible to handle multiple version changes for a given key in a single job execution utilizing the change capture stage.

It just takes a little creativity to turn the duplicate keys into the unique keys that the stage requires.

Mike
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

CDC or Change Capture

Post by rameshrr3 »

IDK why its common to refer to Change Capture stage as CDC stage, because it creates quite a confusion with the CDC Transaction Stage.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I agree. It is a common misnomer. Clearly, the Change Capture stage would be abbreviated CC. CDC is different.
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So basically it's CDD or Change Data Detection? That's what I've known it as and as noted it's a distinctly different process than Change Data Capture.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Yes, the Change Capture stage performs change data detection, but watch out... because "CDD" is another IBM product acronym for Change Data Delivery! :shock:
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Great. Now we need ACD - Acronym Collision Detection.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply