Hello all,
I am facing a performance issue with one of our CRC based jobs.
its taking 10 hours to load.
Our source has 15 mil rows.We cannot do the Change Data Capture with the timestamp because there is'nt a time stamp field which tracks changed rows.
We have changed the hash file to 64 bit to accomodate the increasing hash file size.But we are concerned that it will further degrade the performance.
Are there ways to improve the performance of this job? please suggest.
(Implementing auding at the source level would require changes in the application side which is not possible at this point in time.)
Thanks in advance.
Regards,
Satya.
performance issue with incremental load CRC job.
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
We have no idea what your job design is, so it is impossible to recommend anything apart, perhaps, considering using multiple parallel streams of processing, if your operating system has sufficient spare resources.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 24
- Joined: Wed Oct 17, 2007 1:28 am
- Location: hyderabad
Re: performance issue with incremental load CRC job.
satyasur wrote:Hello all,
I am facing a performance issue with one of our CRC based jobs.
its taking 10 hours to load.
Our source has 15 mil rows.We cannot do the Change Data Capture with the timestamp because there is'nt a time stamp field which tracks changed rows.
We have changed the hash file to 64 bit to accomodate the increasing hash file size.But we are concerned that it will further degrade the performance.
Are there ways to improve the performance of this job? please suggest.
(Implementing auding at the source level would require changes in the application side which is not possible at this point in time.)
Thanks in advance.
Regards,
This depends upon the job design /buffersizes/number of transformers used and there constraints ect...
without the exact design flow and the problem there is no recomemded solution.
Satya.
ETL=DS
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
There is a change data capture solution for Server Jobs using the CRC32 function that has been discussed in a lot of threads. It is very fast and not difficult to build but there has been ongoing debate about whether the CRC32 function builds unique identifiers for change capture or occassionally gets the CDC wrong. Do a forum search for CRC32 for many threads on this.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
here is some more information...
Hello ray,
source---->row merge---->CRC--->Transformer----->rowsplit---->Tatget.
We do the CRC hash lookup in the transformer stage identifying the new rows.
Further we have used tow paths to load inserts and updates into the target.
Can you kindly elaborate on that(are you refering to link partitioner?) .
Do i need to have multiple CPUs to implement that?
I might sound naive ,please excuse me.
Thanks in advance.
Regards,
Satya.
Our job is designed like this:ray.wurlod wrote:We have no idea what your job design is
source---->row merge---->CRC--->Transformer----->rowsplit---->Tatget.
We do the CRC hash lookup in the transformer stage identifying the new rows.
Further we have used tow paths to load inserts and updates into the target.
ray.wurlod wrote:considering using multiple parallel streams of processing, if your operating system has sufficient sp ...
Can you kindly elaborate on that(are you refering to link partitioner?) .
Do i need to have multiple CPUs to implement that?
I might sound naive ,please excuse me.
Thanks in advance.
Regards,
Satya.
Hello vcmburney,
We have already implemented CRC32 for CDD.
We are facing performance issues because of the huge data and huge hash file size.
i was looking for a way around.
(Please excuse me if i have missed some thing in your reply..that is if you are already referring to that work around)
Regards,
Satish.
vmcburney wrote:There is a change data capture solution for Server Jobs using the CRC32 function that has been discussed in a lot of threads. It is very fast and not difficult to build but there has been ongoing deb ...
We have already implemented CRC32 for CDD.
We are facing performance issues because of the huge data and huge hash file size.
i was looking for a way around.
(Please excuse me if i have missed some thing in your reply..that is if you are already referring to that work around)
Regards,
Satish.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
You could try multiple instance jobs. In a transformer partition your data on the natural key using a Transformer across multiple instances. Split your large hash files to match those partitions. Four instances lets you cut your hash file into quarters - if you can find a convenient partition.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Hello vmcburney,
But to implement this we need to divide our data into partitions.
say for example we have a year column..we can divided our data based on year...but currently i dont see such a prospective column in our table....
besides i have little idea of how do i partition hash files!
any inputs on how we can do this would be most welcome.
Thanks for your suggestions. Will give it a try . :)
cheers,
satya.
Hey this sounds intresting!!!!!!!!vmcburney wrote:You could try multiple instance jobs. In a transformer partition your data on the natural key using a Transformer across multiple instances. Split your large hash files to match those partitions. F ...
But to implement this we need to divide our data into partitions.
say for example we have a year column..we can divided our data based on year...but currently i dont see such a prospective column in our table....
besides i have little idea of how do i partition hash files!
any inputs on how we can do this would be most welcome.
Thanks for your suggestions. Will give it a try . :)
cheers,
satya.