Tuning a job which process 4 million records a day

yaminids · Post by **yaminids** » Mon Sep 29, 2008 6:46 pm

Hello friends,

I am trying to tune a job which processes about 4 million records a day. The job reads data from a table in database A and writes to a table in another database (B).

ODBC ==>TRANSFORMER==> Oracle OCI

The job takes about 3 hours and commits after writing all the records.

Is there anyway that I could modify the job to improve its performance?

Thanks a bunch in advance
Yamini

ray.wurlod · Post by **ray.wurlod** » Mon Sep 29, 2008 8:17 pm

Tell us a bit more about the database stages. Are you using array processing? What are your transaction handling strategies? Are all the writes inserts or are there updates as well?

yaminids · Post by **yaminids** » Mon Sep 29, 2008 10:10 pm

Ray,

We are not using array processing. Also, we have set 'Rows per transaction' to zero so as to commit after writing all the records to the database

Thanks
Yamini

chulett · Post by **chulett** » Mon Sep 29, 2008 10:31 pm

Are all the writes inserts or are there updates as well?

yaminids · Post by **yaminids** » Mon Sep 29, 2008 10:38 pm

All are write inserts only

chulett · Post by **chulett** » Mon Sep 29, 2008 10:47 pm

Why not bulk load then?

Two jobs - first land the data from the ODBC source. Second, either script a sqlldr session or use the ORAOCI Bulk stage.

umamahes · Post by **umamahes** » Mon Sep 29, 2008 10:55 pm

Are you using insert rows with out clear option as the update action in the OCI stage.That is the reason you are commiting after the end of loading all records.

If you are commiting after writing all records and if your job aborts after 2:30 hours then again you have to start from the begining (disadvantage).

If you want you can do change capture to avoid updates also.

Use array size otherwise if you comit at the end it will fill the redo space in database also.

yaminids · Post by **yaminids** » Mon Sep 29, 2008 11:05 pm

Craig,

Can we use the ORAOCI BULK stage if DataStage (Linux) and database (AIX) are on two different servers?

Thanks
Yamini

chulett · Post by **chulett** » Mon Sep 29, 2008 11:08 pm

umamahes wrote:Are you using insert rows with out clear option as the update action in the OCI stage.That is the reason you are commiting after the end of loading all records.

I doubt that. And that's certainly not the only reason to commit at the end of the load.

umamahes then wrote:If you are commiting after writing all records and if your job aborts after 2:30 hours then again you have to start from the begining (disadvantage).

Some of us consider that an advantage.

umamahes also wrote:If you want you can do change capture to avoid updates also.

Umm... there are no updates to avoid. Inserts only, remember?

umamahes lastly wrote:Use array size otherwise if you commit at the end it will fill the redo space in database also.

Array Size <> Transaction size. Get more redo space if you need it.

chulett · Post by **chulett** » Mon Sep 29, 2008 11:08 pm

yaminids wrote:Can we use the ORAOCI BULK stage if DataStage (Linux) and database (AIX) are on two different servers?

Sure.

yaminids · Post by **yaminids** » Tue Sep 30, 2008 2:40 am

Craig,

I have increased the Array size to 5000 and the job completed in less than 30 minutes.
Is it a good practice to commit after writing all the records or should I change the 'Rows per transaction' value to commit multiple times?

Thanks
Yamini

ray.wurlod · Post by **ray.wurlod** » Tue Sep 30, 2008 3:19 am

I usually prefer 0 rows per transaction (= commit after all rows loaded) when using OCI. That makes it "all or nothing" and easier to recover.

Beware with the ORABULK stage - it is woefully inefficient (has been around since version 1.0). Use it to write the CTL file but use a Sequential File stage to write the DAT file. This will be orders of magnitude faster.

chulett · Post by **chulett** » Tue Sep 30, 2008 7:05 am

We stick with 0 as the transaction size for the same reasons Ray noted and why I consider that an 'advantage' - we get an "all or nothing" load. It greatly simplifies the restart after failure, just rerun the job with no recovery / restart logic needed once the problem has been corrected.

Ray, you're right about the ORABULK stage a.k.a. the Oracle 7 Load stage. However, I do use the newer ORAOCIBL or Oracle OCI Load stage to good effect. Sure, there are times when I use it to just create the control file by sending zero records to it in Manual mode and use a Sequential stage for the actual data. That way I have full control over the load and can tweak the ctl file on the fly during a scripted load. It also means I get a .log file that I can archive, something the Automatic mode lacks. But, depending on the volume or frequency, I may still go automatic.

Make sure your data is well prepared for Automatic mode, however. Write first to a flat file and triple-check the transformations you are doing. I say this because problems during the load can be... messy. They don't seem to count against any warning limit you may have established so a problem during a 4 million record load could mean 4 million warnings into the job's log.

DeepakCorning · Post by **DeepakCorning** » Tue Sep 30, 2008 12:14 pm

what all changes did you incorporate in your job? Array plus the OCI stage?

yaminids · Post by **yaminids** » Thu Oct 02, 2008 12:50 am

We were using the OCI stage from the beginning but it was performing poorly. Only after increasing the array size, the job picked up the pace