Need information change data capture (DS8.7)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Need information change data capture (DS8.7)

Post by chandra.shekhar@tcs.com »

Dear Team,

We have 150 files . Each files contains 5GB data which is readable by flat file.

Target table contain data about 50 cr.

We are getting full table dump data in the form of flat file (150 files)

We need to update the target table with change records only.

We have develop job as follows:

file ------> change capture ---------> Target table
--------------------|
--------------------|
--------------Ref -Target Table

Question :

1> Is there any stage available so that we can hold the all records in ( Ref -Target Table) memory and pass the file one by one(150 files).
2>Can we concat all files (150) and pass it to the change capture stage so that Ref -Target Table will be read by one time only.(Make sure 150*5=750 GB file read in one way without a bottleneck issue )
3> Suggest good solution for the same.

Thanks
Thanx and Regards,
ETL User
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

You can read the file using file pattern to read all the files at once. did you think about comparing just the checksum instead of everything? Or explored a way to do incremental load? Bulk loads are very fast to load data to table even if it means truncating a reloading everything.

I do not know the complete picture of the requirement or the landscape so cannot provide a definitive answer to the good solution question.

As far as the holding data is concerned you can store it in data set to reduce overhead or read using file pattern and process everything at once. In all these process you surely need a lot of scratch space and make sure to handle the duplicates.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Hey ,

Thanks for the reply.
We cannot perform the truncate and reload always because Next dependent job will fetch the complete data again and de-duplication can be an issue .We want to pass the only change records to the end state.

Question :
1>One file stage contain 150 files ,While reading (750gb) it will be bottleneck ?
2> Is there any stage i can hold reference data (45 cr) and same time i can pass the file one by one?

Thanks
Thanx and Regards,
ETL User
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

No it won't be a bottleneck considering file read is almost always faster than loading in to database provided you have 1.5 TB of scratch space and you are reading similar amount of data in before dataset as well for change capture.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply