Page 1 of 1

performance issue

Posted: Sat Jan 08, 2005 12:38 pm
by smoffa
We are extracting data via an odbc (Progress) connection directly to a txt file. The larger the target txt file gets, the slower the extaction gets. The job eventually aborts. Example: job starts off at 150 row/sec, slows to 66 rows/sec, then aborts. Additionally, it never aborts at the same place.

Why does the job slow down so much? And more importantly, why is the job aborting?

DataStage error is:

SQLSTATE=S1000, DBMS.CODE=-20211
[DataStage][SQL Client][ODBC][DataDirect][ODBC PROGRESS driver][PROGRESS]Server not reachable or possible RPC error

Is there some kind of timeout on the Source side or the DataStage side?

Thanks, Steve

Posted: Sun Jan 09, 2005 2:33 am
by roy
Hi,
there might be a time out, but stating the ODBC driver used and DB in question plus the platform hosting the DB and so on might help us all to point you in the right direction.

in addition I recommend you'll check the performance without the odbc.
just use a transformer with a dummy stage variable, constraint with @OUTROWNUM < number_of_rows_to_be_written and build a derivation to populate a big text file to see if you get the same performance degregation as the file grows.

(hmmm this is my 700 post :shock: )

IHTH,

Posted: Sun Jan 09, 2005 7:40 am
by ArndW
Hello smoffa,

I would put a transform stage in there with a constraint of FALSE or "1=2" so that no records are actually written; then run the job and see if the performance degradation is still there - at least you will remove one possible error source.

Posted: Mon Jan 10, 2005 4:34 am
by Alexandre
150 row/sec is quite slow...
Have you checked the director to see if there are some warnings (too many columns or things like that...)
I had performance problem due to warning issued at each line, once the warning was removed, the performance took off...

Posted: Mon Jan 10, 2005 9:43 am
by datastage
also, tell us what the job is doing in terms of row buffering, and if row buffering is on whether it is In-process or Inter-process. Or at least run the job with different settings in regards to row buffering and see what settings help/hurt.

Posted: Mon Jan 10, 2005 3:42 pm
by smoffa
Thanks for all the responses. Here's the answers to all of your questions:

The database that I am connecting to is PROGRESS SQL92 V9.1D
ODBC driver is DataDirect 4.1

I tried adding a transformer with a 1 = 2 constraint. The extract still progressively slowed down.

I did try in-process or Inter-process row buffering. Job started a little faster but still progressively slowed and eventually errored out.

I am thinking now that the problem is on the PROGRESS db side, but I have not been able to find out what is the cause yet.

Thanks again, Steve

Posted: Mon Jan 10, 2005 4:00 pm
by kcbland
If your Progress server is hosting DataStage as well, you could be getting into some resource contention. If they're separate hosts, then you're getting into some communications jam/timeout between machines.

When I've seen this problem before, the fix was in the ODBC connection manager and you have to adjust some of the switches. We saw a lot of connections have timeouts when the server was having trouble servicing connections under heavy loads. Sorry, it's been a few years and I can't remember it exactly.

Posted: Mon Jan 10, 2005 4:26 pm
by smoffa
Ken,

The DS server is in Orlando and the Progress db that we are extracting from is in Houston.

Being an old pick guy and relatively new to DS and ODBC, can you tell me what you mean by the "ODBC connection manager" and "switches". If you're talking about the setup options when I created the DSN, there were only two: 1) Isolation level and 2) Fetch Array size. Changing either setting does not make a difference.

thanks, Steve

ps. Terry H. says Hey!

Posted: Mon Jan 10, 2005 9:56 pm
by kcbland
I figured it was you guys, I didn't know you went the Windoze route. When you setup the DSN in the ODBC data sources tab on the Windoze server, you picked the ODBC driver for progress. A lot of the ODBC drivers have a bunch of timing values you can monkey with.

Arnds suggestion to put a constraint of @FALSE causes the source database to exercise the SQL and pass the data across the network thru the ODBC driver protocols and then DS throws the data away. This should cause no backpressure up the pipe. Sometimes, the act of writing the data to another database or doing heavy transformation causes undue delay and the source system gets bored or something and the connection times out. Othertimes the rollback segment holding the selection set gets too old because of heavy transactional activity.

Your problem has the symptoms of the database being unable to dump out the dataset fast enough. Because the DS side is not causing any pressure back up the pipe, the problem seems more centered around Progress. Have you put a row limit and seen that the dump actually finishes? Is your SQL really nasty and possibly taking a large amount of time as it spins across vast reaches of data that doesn't meet its criteria and just occassionally finds some qualifying rows, but pukes if a lot of time passes by? Are you doing heavy aggregation or subselects that are cause the database to barf after a while because the sheer effort?

Do you have an alternative path to dump out the data? I don't believe progress has a high-performance bulk unloader, but if your volume is huge then you may consider going that route and dumping to a file and then transfer. If you find that maybe limiting yourself to 1/N portion of the data at a time, and then using job instances to have multiple occurences of the same extract job each using a AND MOD(primarykey, N) = invocationnumber-1 clause to partition the data and dump to separate output files (filename_#invocationnumber#.out or something) and then cat them together.