Tuning a job which process 4 million records a day

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Tuning a job which process 4 million records a day

Post by yaminids »

Hello friends,

I am trying to tune a job which processes about 4 million records a day. The job reads data from a table in database A and writes to a table in another database (B).

ODBC ==>TRANSFORMER==> Oracle OCI

The job takes about 3 hours and commits after writing all the records.

Is there anyway that I could modify the job to improve its performance?

Thanks a bunch in advance
Yamini
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Tell us a bit more about the database stages. Are you using array processing? What are your transaction handling strategies? Are all the writes inserts or are there updates as well?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

Ray,

We are not using array processing. Also, we have set 'Rows per transaction' to zero so as to commit after writing all the records to the database

Thanks
Yamini
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Are all the writes inserts or are there updates as well?
-craig

"You can never have too many knives" -- Logan Nine Fingers
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

All are write inserts only
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why not bulk load then? :?

Two jobs - first land the data from the ODBC source. Second, either script a sqlldr session or use the ORAOCI Bulk stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
umamahes
Premium Member
Premium Member
Posts: 110
Joined: Tue Jul 04, 2006 9:08 pm

Post by umamahes »

Are you using insert rows with out clear option as the update action in the OCI stage.That is the reason you are commiting after the end of loading all records.

If you are commiting after writing all records and if your job aborts after 2:30 hours then again you have to start from the begining (disadvantage).

If you want you can do change capture to avoid updates also.

Use array size otherwise if you comit at the end it will fill the redo space in database also.
HI
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

Craig,

Can we use the ORAOCI BULK stage if DataStage (Linux) and database (AIX) are on two different servers?

Thanks
Yamini
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

umamahes wrote:Are you using insert rows with out clear option as the update action in the OCI stage.That is the reason you are commiting after the end of loading all records.

I doubt that. And that's certainly not the only reason to commit at the end of the load.
umamahes then wrote:If you are commiting after writing all records and if your job aborts after 2:30 hours then again you have to start from the begining (disadvantage).
Some of us consider that an advantage.
umamahes also wrote:If you want you can do change capture to avoid updates also.
Umm... there are no updates to avoid. Inserts only, remember?
umamahes lastly wrote:Use array size otherwise if you commit at the end it will fill the redo space in database also.
Array Size <> Transaction size. Get more redo space if you need it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

yaminids wrote:Can we use the ORAOCI BULK stage if DataStage (Linux) and database (AIX) are on two different servers?
Sure.
-craig

"You can never have too many knives" -- Logan Nine Fingers
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

Craig,

I have increased the Array size to 5000 and the job completed in less than 30 minutes.
Is it a good practice to commit after writing all the records or should I change the 'Rows per transaction' value to commit multiple times?

Thanks
Yamini
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I usually prefer 0 rows per transaction (= commit after all rows loaded) when using OCI. That makes it "all or nothing" and easier to recover.

Beware with the ORABULK stage - it is woefully inefficient (has been around since version 1.0). Use it to write the CTL file but use a Sequential File stage to write the DAT file. This will be orders of magnitude faster.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

We stick with 0 as the transaction size for the same reasons Ray noted and why I consider that an 'advantage' - we get an "all or nothing" load. It greatly simplifies the restart after failure, just rerun the job with no recovery / restart logic needed once the problem has been corrected.

Ray, you're right about the ORABULK stage a.k.a. the Oracle 7 Load stage. However, I do use the newer ORAOCIBL or Oracle OCI Load stage to good effect. Sure, there are times when I use it to just create the control file by sending zero records to it in Manual mode and use a Sequential stage for the actual data. That way I have full control over the load and can tweak the ctl file on the fly during a scripted load. It also means I get a .log file that I can archive, something the Automatic mode lacks. But, depending on the volume or frequency, I may still go automatic.

Make sure your data is well prepared for Automatic mode, however. Write first to a flat file and triple-check the transformations you are doing. I say this because problems during the load can be... messy. They don't seem to count against any warning limit you may have established so a problem during a 4 million record load could mean 4 million warnings into the job's log. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Post by DeepakCorning »

what all changes did you incorporate in your job? Array plus the OCI stage?
Thanks
Deepak Patil

Convince Them Confuse Them .. What's the difference?
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Post by yaminids »

We were using the OCI stage from the beginning but it was performing poorly. Only after increasing the array size, the job picked up the pace
Post Reply