Performance of Sequential File Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
IBMDS4KK
Participant
Posts: 20
Joined: Tue Sep 10, 2013 5:50 am

Performance of Sequential File Stage

Post by IBMDS4KK »

Hi,

Job Design : Sequential File --> Transformer --> Oracle Connector

Description : Source File is around 7 GB which is having 9 million records. These records has to be insert in target table.

Issue Description : Job run time is around 4 hrs which shouldn't according to business.

Hint : Job is running on one node and Records are not fixed length.

Trail and Error : In the Sequential file stage, I have tried the option 'Number of Readers per node = 2' and in the Oracle Connector stage, 'Record count = 10000 and Array Size = 20000'. In the original job, the 'Record count = 2000 and Array size = 5000'. But no use because almost same time for original job and after modification.

Please suggest something to reduce the job run time (Performance).
Kiran Kumar

...ALL IS WELL...
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Re: Performance of Sequential File Stage

Post by thompsonp »

How long do "the business" think the job should take? :lol:

You can performance tune for a long time; best to have a realistic target in mind.

You should establish which part(s) of the job are causing the perceived bottleneck. Judging by the subject line you have assumed it is the Sequential File Stage.

How long does the job take to run if you remove the Oracle Connector (and subsequently the transformer)?
Is the job cpu or io bound?

What options are you using to load the data? Inserts, Upserts, bulk load?
srinivas.g
Participant
Posts: 251
Joined: Mon Jun 09, 2008 5:52 am

Post by srinivas.g »

Why you are running on One node. If possible run with 8 node and see.
Srinu Gadipudi
srinivas.g
Participant
Posts: 251
Joined: Mon Jun 09, 2008 5:52 am

Post by srinivas.g »

Also How many columns are in your job is also matter here.
Srinu Gadipudi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why don't you measure the baseline performance? How fast can your server read this volume of data without DataStage? Create a DataStage job consisting only of a Sequential File stage and a Copy stage with no output. How fast does that read the file?

Multiple readers may help. Play with that certainly.

Array size and record count are irrelevant to sequential file processing, which uses STREAMS I/O.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Exactly. I wager 400 Quatloos on Oracle being the sole 'bottleneck' here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
IBMDS4KK
Participant
Posts: 20
Joined: Tue Sep 10, 2013 5:50 am

Post by IBMDS4KK »

Hi,

Thanks for all. I got the solution by applying Multiple Instance.

Answers to Queries :

Oracle Table Columns = 266
Business Requirements = Job has to run on one node
Job has to finish with in 1 hr
Loading Data = Insert Mode
Kiran Kumar

...ALL IS WELL...
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

Another way to speed up processing when reading sequential files is read the data as a single column since reading the CSV is a sequential operation. Then parse the data so that the access of the data is sequential but the parsing of the data into columns is done in parallel.

Regards

Ray D
Post Reply