Create SQL statements instead of using database stages

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Create SQL statements instead of using database stages

Post by clarcombe »

We are currently investigating very poor performance with our inserts to our DB2 Z/OS machine.

Deletes run around 100 rows per second as do the Inserts. One alternative we are looking at is to create the SQL queries and transfer them to the mainframe via MQ.

Are there any database stages which will actually create the SQL instead of running it. Of course I could write a job to do this, but why re-invent the wheel.

Many thanks
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

We are seeing DB2/z extract rates ranging from 11,000 rows per second on the low end to over 128,000 rows per second, depending on the table and record length. That is with using the DB2 Connector stage and tens of millions of records. We also do inserts, updates, and deletes without any performance concerns, however our volume on those operations is quite small.

I am afraid you are going down a dangerous (expensive) path of workarounds. Why is the performance so bad to begin with, and can you just find a solution to that problem?
Choose a job you love, and you will never have to work a day in your life. - Confucius
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

qt_ky wrote:I am afraid you are going down a dangerous (expensive) path of workarounds. Why is the performance so bad to begin with, and can you just find a solution to that problem?
Our extract rates are between 3-5000 rows per second. Our data Deletes and updates are atrociously slow - between 1-300 rows per second.

We use a JDBC connector because the DB2 connector was losing rows and IBM couldn't identify why.

I removed the triggers on the target tables as we perform referential integrity on the Datastage side, but this still made no difference.

Our volume on the 40 tables we have to update is a meagre 500k-1m per table. We perform delete inserts and not updates. This still takes upto 4 hours though which is unacceptable

I could spend a long time trying to find the cause or I can workaround.

I'm am willing to entertain any suggestions you have.

We use 11.5 on a linux machine.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I have been down the path of workarounds before and say it can be expensive in terms of labor and lost time. In once case we had to discover workarounds to the workarounds, up to 5 or 6 layers deep, and so our project went as many times over the original estimate!

In general, ensure that everything related to the primary solution has been investigated thoroughly to isolate and fix the bottleneck before getting too deep into workarounds. Have you experimented with tweaking the various settings in the Connector stage. Have you engaged your DataStage Support. Maybe you have an IBM rep who can help find the right people or escalate. Has the DB2 DBA exhausted their resources or do they see some contention. Have the mainframe system people taken a close look. Have the network/firewall people proven there's no bottleneck in between systems, etc.
Choose a job you love, and you will never have to work a day in your life. - Confucius
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Many of the points you have mentioned are valid and we have indeed engaged the support of IBM on several occasions, for this and other cases.

We have also checked the speed of connection using pings between our linux machine and the mainframe. Whilst its not blisteringly quick, it should be quick enough to give a higher throughput.

We have seen how fast the SQL queries can run stand alone on the mainframe and in parallel to, so by generating the SQL and passing the scripts via MQ, we can have several scripts running in parallel rapidly.

The jobs we have in place will be reused and we will add new transactional granularity for the SQL which is currently not easily obtained with connectors. The pain is just converting the records to SQL using metadata, hence my original question.

Another issue is that when there is a data or connection problem the Mainframe team can't identify it and it comes back to us to check. By running the scripts directly on their machine, it enables us to see exactly what errors occur.

I do appreciate your comments, but really I am at a loss as to see what else I can do.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi Colin.

I'm no expert on this subject, but considering the fact that there are still additional options, it might be worth investigating ---- have you tried setting up an ODBC Connection instead...using the Direct wire protocol of the drivers provided. Its been many years since I have configured this myself, and perhaps it is only sufficient for reading, but that might eliminate any problems introduced by JDBC?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

I think this was tried before we went the JDBC route. ODBC is notoriously slow anyway as it can't take advantage of the native drivers.

But both of you have raised good points, perhaps I should investigate other options first as the workaround will take a fair amount of time to develop.

That said, once in place we can then use it for further development as this is part of the new infrastructure being put in place.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Some basic questions you may already have covered. My intention is to isolate the potential causes.

Have you done a simple replacement of the connector channel? By this I mean writing to a local file, using FTP to land the data on the mainframe, or setting up a test channel with some other DBMS like Oracle or MSSql. You mentioned pinging the connector channel and it having sufficient bandwidth for the performance you expect, but that is a far cry from the overhead of a connector process.

Speaking of overhead: is your version of DB2 up to modern standards? How is the workload manager set up for external (non-Z/Os) processes? Is Z/Os properly configured for this sort of access? I don't know enough to suggest details, but there should be people in your shop who do.

The part where you use MQ makes me wonder about those issues. Pushing SQL text to run natively is very different from what happens out of a DB2 stage. Indeed, one of the basic performance questions out of the Big Data push is where will your code run most effectively? The MQ "workaround" could turn out to be the better choice of approach.

One last possibly dumb suggestion: revisit the relational integrity. RI is a constraint on the tables, and performance hits will be on DB2, not in DataStage.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Everyone needs to remember that rows per second is misleading since you need to understand that it is volume of data per second, not the row count that matters.

If I had Job #1 which had column in one row and the sum was... 20 characters per row. My rows per second would be far greater than if I had job number 2 extracting 2000 characters per row from the same database/table.

So "My rows per second from DB2 on Zos" is not an indication of what you should be getting from your DB2 z connection, because he have a different row length.

and the datastage calculation on rows per second is misleading anyhow because the stop watch starts at the start of the job. The building of the result set on DB2 happens before you get row #1 out of it (I believe). So your Rows/sec is not your actual data rate once it starts. It's the average data rate based upon the start of the job.

==================

I would still opt for a DB2 Connector stage over any JDBC or ODBC.

ODBC is slow, but paralyzed ODBC can meet your needs. A partition read means you have N amount of ODBC connections to that system. It's that data pipe between you and the database. More pipes the better.
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Thankyou for your comments.

From what I have seen here, I could possibly do the following

1) Test the connection to the mainframe using a Datastage FTP
2) Test the ODBC connector and "play" with the parameters
3) Retest the DB2 connector and "play" with the parameters
4) Rework the referential integrity of the Database
5) Ensure that the latest version of DB2 is running.

The last two aren't really in the scope of this work. They have just upgraded to DB2 11 in "conversion" mode and the structure of the tables is fixed.

After discussions with the DB2 database designer, he is suggesting that we avoid MQ and try to pass "result sets" to a DB2 stored procedure. So instead of the to and fro-ing of a stage, pass the data in one fell swoop to the mainframe.

I'll look at this solution first. I assume its possible to call a stored procedure from a DB2 or JDBC stage with a result set?
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
R.K.Glover
Participant
Posts: 8
Joined: Mon Mar 11, 2013 2:51 pm
Location: RTP, North Carolina

Post by R.K.Glover »

clarcombe wrote:We use a JDBC connector because the DB2 connector was losing rows and IBM couldn't identify why.
...
<snip>
...
I'm am willing to entertain any suggestions you have.

We use 11.5 on a linux machine.
Honestly, I'd circle back to figuring out the DB2 connector. Like others have pointed out, figuring out work-arounds can be timely and expensive, and seldom result in improved performance.

Does it drop rows when you use only one partition? Do you have partitioned reads turned off? What's your Isolation Level? Autocommit? Fail on Row Error? Does it seem to magically drop rows after a perfectly round number, like 2000? (might indicate that you have waves set up, and it's only processing the first one).

I haven't used a DB2 connector in about three years, but like most connectors, it has it's quirks, and you need to have somebody that knows them to get it configured properly.
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

For reference and completeness, I am adding what we have decided to do after many weeks of discussion.

We have two modes for our inserts - full and delta. Full is for an initial run when the business rules have changed and delta for daily differences.

Old technique
Full mode
- Bulk load using DB2 connector to indexed target tables

Time indexed target tables are unavailable - 3 hours

Delta mode
- Deletes then inserts to indexed target tables using JDBC connector
Time for updates - 1-5 hours

New technique
Full mode
- Bulk load using DB2 connector to un-indexed staging tables
- DB2 Merge script to unload from unindexed staged tables

Time indexed target tables are unavailable - 45 minutes

Delta mode
- Inserts into unindexed session-only staging tables (no DB2 logs or catalog used)
- Stored procedure to delete matching delta keys in staging tables from indexed target tables
- Stored procedure to insert staging data into indexed tables
Time for updates - 0-1 hours

Thanks all for time taken to contribute to this post
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
Post Reply