performance issue with incremental load CRC job.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

performance issue with incremental load CRC job.

Post by satyasur »

Hello all,
I am facing a performance issue with one of our CRC based jobs.
its taking 10 hours to load.
Our source has 15 mil rows.We cannot do the Change Data Capture with the timestamp because there is'nt a time stamp field which tracks changed rows.
We have changed the hash file to 64 bit to accomodate the increasing hash file size.But we are concerned that it will further degrade the performance.
Are there ways to improve the performance of this job? please suggest.
(Implementing auding at the source level would require changes in the application side which is not possible at this point in time.)

Thanks in advance.
Regards,
Satya.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

We have no idea what your job design is, so it is impossible to recommend anything apart, perhaps, considering using multiple parallel streams of processing, if your operating system has sufficient spare resources.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sanath1234
Participant
Posts: 24
Joined: Wed Oct 17, 2007 1:28 am
Location: hyderabad

Re: performance issue with incremental load CRC job.

Post by sanath1234 »

satyasur wrote:Hello all,
I am facing a performance issue with one of our CRC based jobs.
its taking 10 hours to load.
Our source has 15 mil rows.We cannot do the Change Data Capture with the timestamp because there is'nt a time stamp field which tracks changed rows.
We have changed the hash file to 64 bit to accomodate the increasing hash file size.But we are concerned that it will further degrade the performance.
Are there ways to improve the performance of this job? please suggest.
(Implementing auding at the source level would require changes in the application side which is not possible at this point in time.)

Thanks in advance.
Regards,

This depends upon the job design /buffersizes/number of transformers used and there constraints ect...
without the exact design flow and the problem there is no recomemded solution.
Satya.
ETL=DS
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There is a change data capture solution for Server Jobs using the CRC32 function that has been discussed in a lot of threads. It is very fast and not difficult to build but there has been ongoing debate about whether the CRC32 function builds unique identifiers for change capture or occassionally gets the CDC wrong. Do a forum search for CRC32 for many threads on this.
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

here is some more information...

Post by satyasur »

Hello ray,
ray.wurlod wrote:We have no idea what your job design is
Our job is designed like this:


source---->row merge---->CRC--->Transformer----->rowsplit---->Tatget.

We do the CRC hash lookup in the transformer stage identifying the new rows.
Further we have used tow paths to load inserts and updates into the target.

ray.wurlod wrote:considering using multiple parallel streams of processing, if your operating system has sufficient sp ...

Can you kindly elaborate on that(are you refering to link partitioner?) .
Do i need to have multiple CPUs to implement that?
I might sound naive ,please excuse me.
Thanks in advance.
Regards,
Satya.
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

Post by satyasur »

Hello vcmburney,
vmcburney wrote:There is a change data capture solution for Server Jobs using the CRC32 function that has been discussed in a lot of threads. It is very fast and not difficult to build but there has been ongoing deb ...

We have already implemented CRC32 for CDD.
We are facing performance issues because of the huge data and huge hash file size.
i was looking for a way around.
(Please excuse me if i have missed some thing in your reply..that is if you are already referring to that work around)
Regards,
Satish.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You could try multiple instance jobs. In a transformer partition your data on the natural key using a Transformer across multiple instances. Split your large hash files to match those partitions. Four instances lets you cut your hash file into quarters - if you can find a convenient partition.
satyasur
Participant
Posts: 15
Joined: Fri Apr 07, 2006 12:11 am

Post by satyasur »

Hello vmcburney,
vmcburney wrote:You could try multiple instance jobs. In a transformer partition your data on the natural key using a Transformer across multiple instances. Split your large hash files to match those partitions. F ...
Hey this sounds intresting!!!!!!!!
But to implement this we need to divide our data into partitions.
say for example we have a year column..we can divided our data based on year...but currently i dont see such a prospective column in our table....
besides i have little idea of how do i partition hash files!
any inputs on how we can do this would be most welcome.
Thanks for your suggestions. Will give it a try . :)
cheers,
satya.
Post Reply