Page 1 of 1

Performance Issue

Posted: Thu Jan 06, 2005 12:35 pm
by yaminids
Hello there,

I designed a job which has about 20 stages in it. From the performance stand-point, would it be a good idea to design a job with so many stages especially when I can break this job into small jobs.

Thanks in advance.
-Yamini

Posted: Thu Jan 06, 2005 12:53 pm
by DSkkk
it is always better to break the jobs into smaller and call them in a sequence as you can have savepoints in a sequence and hence you don't have to restart it from the beginning when it aborts

Re: Performance Issue

Posted: Thu Jan 06, 2005 1:01 pm
by kiran_kom
yaminids wrote:Hello there,

I designed a job which has about 20 stages in it. From the performance stand-point, would it be a good idea to design a job with so many stages especially when I can break this job into small jobs.

Thanks in advance.
-Yamini
Depends......
Assuming you have enough CPU, and all your intermediary stages are active stages or lookups, you would probably be better off by leaving the job in one piece. Turn inter-process buffering on.

I personally dont like breaking a job into a sequence of smaller ones because you will need to dump data on a file/table for the next job in the sequence. You will have a lot of performance loss because of these 'data landings'.

I actually like having more stages in a job than fewer, usually my servers have 4 or 8 cpus, if I distribute a series of transformation between two transformers, I would be better off coz I'd be using two cpu's as opposed to one cpu if I was doing everything in one transformer.

Just my 2 cents....

Posted: Fri Jan 07, 2005 3:36 am
by andru
We work on a 16 CPU box with IPC on. We had a job with more than 20 stages which was performing very poorly. We split the job into 4 with intermediate files. The performance has improved considerable. So with my experience, I feel splitting the job into chunks should improve ur performance.

Re: Performance Issue

Posted: Sun Jan 09, 2005 2:50 pm
by T42
kiran_kom wrote:I personally dont like breaking a job into a sequence of smaller ones because you will need to dump data on a file/table for the next job in the sequence. You will have a lot of performance loss because of these 'data landings'.
Be careful there. While we're talking about Server here, data landing are fairly common, especially with very large set of data (larger than what you have in memory). With EE, data landing is clearly defined (by using Scratch space). High-performance disk drives would make EE jobs go way faster with large datasets. With Server, it's a matter of determining how much memory you are using. If you are using a lot of memory, and will be hitting swap space often, might as well land the data for further use.

Re: Performance Issue

Posted: Mon Jan 10, 2005 12:22 am
by yaminids
Hello all,
Thank you very much for your suggestions.
-Yamini