performance issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
smoffa
Premium Member
Premium Member
Posts: 40
Joined: Wed Jun 30, 2004 6:00 am
Location: Orlando, FL
Contact:

performance issue

Post by smoffa »

We are extracting data via an odbc (Progress) connection directly to a txt file. The larger the target txt file gets, the slower the extaction gets. The job eventually aborts. Example: job starts off at 150 row/sec, slows to 66 rows/sec, then aborts. Additionally, it never aborts at the same place.

Why does the job slow down so much? And more importantly, why is the job aborting?

DataStage error is:

SQLSTATE=S1000, DBMS.CODE=-20211
[DataStage][SQL Client][ODBC][DataDirect][ODBC PROGRESS driver][PROGRESS]Server not reachable or possible RPC error

Is there some kind of timeout on the Source side or the DataStage side?

Thanks, Steve
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
there might be a time out, but stating the ODBC driver used and DB in question plus the platform hosting the DB and so on might help us all to point you in the right direction.

in addition I recommend you'll check the performance without the odbc.
just use a transformer with a dummy stage variable, constraint with @OUTROWNUM < number_of_rows_to_be_written and build a derivation to populate a big text file to see if you get the same performance degregation as the file grows.

(hmmm this is my 700 post :shock: )

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hello smoffa,

I would put a transform stage in there with a constraint of FALSE or "1=2" so that no records are actually written; then run the job and see if the performance degradation is still there - at least you will remove one possible error source.
Alexandre
Participant
Posts: 12
Joined: Fri Dec 31, 2004 5:04 am
Location: Paris
Contact:

Post by Alexandre »

150 row/sec is quite slow...
Have you checked the director to see if there are some warnings (too many columns or things like that...)
I had performance problem due to warning issued at each line, once the warning was removed, the performance took off...
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

also, tell us what the job is doing in terms of row buffering, and if row buffering is on whether it is In-process or Inter-process. Or at least run the job with different settings in regards to row buffering and see what settings help/hurt.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
smoffa
Premium Member
Premium Member
Posts: 40
Joined: Wed Jun 30, 2004 6:00 am
Location: Orlando, FL
Contact:

Post by smoffa »

Thanks for all the responses. Here's the answers to all of your questions:

The database that I am connecting to is PROGRESS SQL92 V9.1D
ODBC driver is DataDirect 4.1

I tried adding a transformer with a 1 = 2 constraint. The extract still progressively slowed down.

I did try in-process or Inter-process row buffering. Job started a little faster but still progressively slowed and eventually errored out.

I am thinking now that the problem is on the PROGRESS db side, but I have not been able to find out what is the cause yet.

Thanks again, Steve
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If your Progress server is hosting DataStage as well, you could be getting into some resource contention. If they're separate hosts, then you're getting into some communications jam/timeout between machines.

When I've seen this problem before, the fix was in the ODBC connection manager and you have to adjust some of the switches. We saw a lot of connections have timeouts when the server was having trouble servicing connections under heavy loads. Sorry, it's been a few years and I can't remember it exactly.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
smoffa
Premium Member
Premium Member
Posts: 40
Joined: Wed Jun 30, 2004 6:00 am
Location: Orlando, FL
Contact:

Post by smoffa »

Ken,

The DS server is in Orlando and the Progress db that we are extracting from is in Houston.

Being an old pick guy and relatively new to DS and ODBC, can you tell me what you mean by the "ODBC connection manager" and "switches". If you're talking about the setup options when I created the DSN, there were only two: 1) Isolation level and 2) Fetch Array size. Changing either setting does not make a difference.

thanks, Steve

ps. Terry H. says Hey!
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I figured it was you guys, I didn't know you went the Windoze route. When you setup the DSN in the ODBC data sources tab on the Windoze server, you picked the ODBC driver for progress. A lot of the ODBC drivers have a bunch of timing values you can monkey with.

Arnds suggestion to put a constraint of @FALSE causes the source database to exercise the SQL and pass the data across the network thru the ODBC driver protocols and then DS throws the data away. This should cause no backpressure up the pipe. Sometimes, the act of writing the data to another database or doing heavy transformation causes undue delay and the source system gets bored or something and the connection times out. Othertimes the rollback segment holding the selection set gets too old because of heavy transactional activity.

Your problem has the symptoms of the database being unable to dump out the dataset fast enough. Because the DS side is not causing any pressure back up the pipe, the problem seems more centered around Progress. Have you put a row limit and seen that the dump actually finishes? Is your SQL really nasty and possibly taking a large amount of time as it spins across vast reaches of data that doesn't meet its criteria and just occassionally finds some qualifying rows, but pukes if a lot of time passes by? Are you doing heavy aggregation or subselects that are cause the database to barf after a while because the sheer effort?

Do you have an alternative path to dump out the data? I don't believe progress has a high-performance bulk unloader, but if your volume is huge then you may consider going that route and dumping to a file and then transfer. If you find that maybe limiting yourself to 1/N portion of the data at a time, and then using job instances to have multiple occurences of the same extract job each using a AND MOD(primarykey, N) = invocationnumber-1 clause to partition the data and dump to separate output files (filename_#invocationnumber#.out or something) and then cat them together.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply