Need information change data capture (DS8.7)

chandra.shekhar@tcs.com · Wed Dec 31, 2014 1:11 am

Dear Team,

We have 150 files . Each files contains 5GB data which is readable by flat file.

Target table contain data about 50 cr.

We are getting full table dump data in the form of flat file (150 files)

We need to update the target table with change records only.

We have develop job as follows:

file ------> change capture ---------> Target table
--------------------|
--------------------|
--------------Ref -Target Table

Question :

1> Is there any stage available so that we can hold the all records in ( Ref -Target Table) memory and pass the file one by one(150 files).
2>Can we concat all files (150) and pass it to the change capture stage so that Ref -Target Table will be read by one time only.(Make sure 150*5=750 GB file read in one way without a bottleneck issue )
3> Suggest good solution for the same.

Thanks

priyadarshikunal · Post by **priyadarshikunal** » Wed Dec 31, 2014 3:04 am

You can read the file using file pattern to read all the files at once. did you think about comparing just the checksum instead of everything? Or explored a way to do incremental load? Bulk loads are very fast to load data to table even if it means truncating a reloading everything.

I do not know the complete picture of the requirement or the landscape so cannot provide a definitive answer to the good solution question.

As far as the holding data is concerned you can store it in data set to reduce overhead or read using file pattern and process everything at once. In all these process you surely need a lot of scratch space and make sure to handle the duplicates.

chandra.shekhar@tcs.com · Wed Dec 31, 2014 3:52 am

Hey ,

Thanks for the reply.
We cannot perform the truncate and reload always because Next dependent job will fetch the complete data again and de-duplication can be an issue .We want to pass the only change records to the end state.

Question :
1>One file stage contain 150 files ,While reading (750gb) it will be bottleneck ?
2> Is there any stage i can hold reference data (45 cr) and same time i can pass the file one by one?

Thanks

priyadarshikunal · Post by **priyadarshikunal** » Wed Dec 31, 2014 4:19 am

No it won't be a bottleneck considering file read is almost always faster than loading in to database provided you have 1.5 TB of scratch space and you are reading similar amount of data in before dataset as well for change capture.