DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
clarcombe



Group memberships:
Premium Members

Joined: 08 Jun 2005
Posts: 499
Location: Europe
Points: 5223

Post Posted: Fri Aug 25, 2017 3:00 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
We are currently investigating very poor performance with our inserts to our DB2 Z/OS machine.

Deletes run around 100 rows per second as do the Inserts. One alternative we are looking at is to create the SQL queries and transfer them to the mainframe via MQ.

Are there any database stages which will actually create the SQL instead of running it. Of course I could write a job to do this, but why re-invent the wheel.

Many thanks

_________________
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
qt_ky



Group memberships:
Premium Members

Joined: 03 Aug 2011
Posts: 2619
Location: USA
Points: 19402

Post Posted: Fri Aug 25, 2017 6:03 am Reply with quote    Back to top    

We are seeing DB2/z extract rates ranging from 11,000 rows per second on the low end to over 128,000 rows per second, depending on the table and record length. That is with using the DB2 Connector stage and tens of millions of records. We also do inserts, updates, and deletes without any performance concerns, however our volume on those operations is quite small.

I am afraid you are going down a dangerous (expensive) path of workarounds. Why is the performance so bad to begin with, and can you just find a solution to that problem?

_________________
Choose a job you love, and you will never have to work a day in your life. - Confucius
Rate this response:  
Not yet rated
clarcombe



Group memberships:
Premium Members

Joined: 08 Jun 2005
Posts: 499
Location: Europe
Points: 5223

Post Posted: Fri Aug 25, 2017 6:19 am Reply with quote    Back to top    

qt_ky wrote:
I am afraid you are going down a dangerous (expensive) path of workarounds. Why is the performance so bad to begin with, and can you just find a solution to that problem?


Our extract rates are between 3-5000 rows per second. Our data Deletes and updates are atrociously slow - between 1-300 rows per second.

We use a JDBC connector because the DB2 connector was losing rows and IBM couldn't identify why.

I removed the triggers on the target tables as we perform referential integrity on the Datastage side, but this still made no difference.

Our volume on the 40 tables we have to update is a meagre 500k-1m per table. We perform delete inserts and not updates. This still takes upto 4 hours though which is unacceptable

I could spend a long time trying to find the cause or I can workaround.

I'm am willing to entertain any suggestions you have.

We use 11.5 on a linux machine.

_________________
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
Rate this response:  
Not yet rated
qt_ky



Group memberships:
Premium Members

Joined: 03 Aug 2011
Posts: 2619
Location: USA
Points: 19402

Post Posted: Fri Aug 25, 2017 6:36 am Reply with quote    Back to top    

I have been down the path of workarounds before and say it can be expensive in terms of labor and lost time. In once case we had to discover workarounds to the workarounds, up to 5 or 6 layers deep, and so our project went as many times over the original estimate!

In general, ensure that everything related to the primary solution has been investigated thoroughly to isolate and fix the bottleneck before getting too deep into workarounds. Have you experimented with tweaking the various settings in the Connector stage. Have you engaged your DataStage Support. Maybe you have an IBM rep who can help find the right people or escalate. Has the DB2 DBA exhausted their resources or do they see some contention. Have the mainframe system people taken a close look. Have the network/firewall people proven there's no bottleneck in between systems, etc.

_________________
Choose a job you love, and you will never have to work a day in your life. - Confucius
Rate this response:  
Not yet rated
clarcombe



Group memberships:
Premium Members

Joined: 08 Jun 2005
Posts: 499
Location: Europe
Points: 5223

Post Posted: Fri Aug 25, 2017 7:02 am Reply with quote    Back to top    

Many of the points you have mentioned are valid and we have indeed engaged the support of IBM on several occasions, for this and other cases.

We have also checked the speed of connection using pings between our linux machine and the mainframe. Whilst its not blisteringly quick, it should be quick enough to give a higher throughput.

We have seen how fast the SQL queries can run stand alone on the mainframe and in parallel to, so by generating the SQL and passing the scripts via MQ, we can have several scripts running in parallel rapidly.

The jobs we have in place will be reused and we will add new transactional granularity for the SQL which is currently not easily obtained with connectors. The pain is just converting the records to SQL using metadata, hence my original question.

Another issue is that when there is a data or connection problem the Mainframe team can't identify it and it comes back to us to check. By running the scripts directly on their machine, it enables us to see exactly what errors occur.

I do appreciate your comments, but really I am at a loss as to see what else I can do.

_________________
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
Rate this response:  
Not yet rated
eostic

Premium Poster



Group memberships:
Premium Members

Joined: 17 Oct 2005
Posts: 3708

Points: 29604

Post Posted: Fri Aug 25, 2017 7:10 am Reply with quote    Back to top    

Hi Colin. I'm no expert on this subject, but considering the fact that there are still additional options, it might be worth investigating ---- have you tried setting up an ODBC Connection instea ...

_________________
Ernie Ostic

blogit!
Open IGC is Here!
Rate this response:  
Not yet rated
clarcombe



Group memberships:
Premium Members

Joined: 08 Jun 2005
Posts: 499
Location: Europe
Points: 5223

Post Posted: Fri Aug 25, 2017 7:17 am Reply with quote    Back to top    

I think this was tried before we went the JDBC route. ODBC is notoriously slow anyway as it can't take advantage of the native drivers.

But both of you have raised good points, perhaps I should investigate other options first as the workaround will take a fair amount of time to develop.

That said, once in place we can then use it for further development as this is part of the new infrastructure being put in place.

_________________
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 614
Location: Malvern, PA
Points: 5825

Post Posted: Fri Aug 25, 2017 8:56 am Reply with quote    Back to top    

Some basic questions you may already have covered. My intention is to isolate the potential causes.

Have you done a simple replacement of the connector channel? By this I mean writing to a local file, using FTP to land the data on the mainframe, or setting up a test channel with some other DBMS like Oracle or MSSql. You mentioned pinging the connector channel and it having sufficient bandwidth for the performance you expect, but that is a far cry from the overhead of a connector process.

Speaking of overhead: is your version of DB2 up to modern standards? How is the workload manager set up for external (non-Z/Os) processes? Is Z/Os properly configured for this sort of access? I don't know enough to suggest details, but there should be people in your shop who do.

The part where you use MQ makes me wonder about those issues. Pushing SQL text to run natively is very different from what happens out of a DB2 stage. Indeed, one of the basic performance questions out of the Big Data push is where will your code run most effectively? The MQ "workaround" could turn out to be the better choice of approach.

One last possibly dumb suggestion: revisit the relational integrity. RI is a constraint on the tables, and performance hits will be on DB2, not in DataStage.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596
Rate this response:  
Not yet rated
PaulVL



Group memberships:
Premium Members

Joined: 17 Dec 2010
Posts: 1139

Points: 7537

Post Posted: Fri Aug 25, 2017 2:08 pm Reply with quote    Back to top    

Everyone needs to remember that rows per second is misleading since you need to understand that it is volume of data per second, not the row count that matters.

If I had Job #1 which had column in one row and the sum was... 20 characters per row. My rows per second would be far greater than if I had job number 2 extracting 2000 characters per row from the same database/table.

So "My rows per second from DB2 on Zos" is not an indication of what you should be getting from your DB2 z connection, because he have a different row length.

and the datastage calculation on rows per second is misleading anyhow because the stop watch starts at the start of the job. The building of the result set on DB2 happens before you get row #1 out of it (I believe). So your Rows/sec is not your actual data rate once it starts. It's the average data rate based upon the start of the job.

==================

I would still opt for a DB2 Connector stage over any JDBC or ODBC.

ODBC is slow, but paralyzed ODBC can meet your needs. A partition read means you have N amount of ODBC connections to that system. It's that data pipe between you and the database. More pipes the better.
Rate this response:  
Not yet rated
clarcombe



Group memberships:
Premium Members

Joined: 08 Jun 2005
Posts: 499
Location: Europe
Points: 5223

Post Posted: Mon Aug 28, 2017 2:22 am Reply with quote    Back to top    

Thankyou for your comments.

From what I have seen here, I could possibly do the following

1) Test the connection to the mainframe using a Datastage FTP
2) Test the ODBC connector and "play" with the parameters
3) Retest the DB2 connector and "play" with the parameters
4) Rework the referential integrity of the Database
5) Ensure that the latest version of DB2 is running.

The last two aren't really in the scope of this work. They have just upgraded to DB2 11 in "conversion" mode and the structure of the tables is fixed.

After discussions with the DB2 database designer, he is suggesting that we avoid MQ and try to pass "result sets" to a DB2 stored procedure. So instead of the to and fro-ing of a stage, pass the data in one fell swoop to the mainframe.

I'll look at this solution first. I assume its possible to call a stored procedure from a DB2 or JDBC stage with a result set?

_________________
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
Rate this response:  
Not yet rated
R.K.Glover
Participant



Joined: 11 Mar 2013
Posts: 7
Location: RTP, North Carolina
Points: 45

Post Posted: Thu Aug 31, 2017 7:58 am Reply with quote    Back to top    

clarcombe wrote:
We use a JDBC connector because the DB2 connector was losing rows and IBM couldn't identify why.
...
<snip>
...
I'm am willing to entertain any suggestions you have.

We use 11.5 on a linux machine.


Honestly, I'd circle back to figuring out the DB2 connector. Like others have pointed out, figuring out work-arounds can be timely and expensive, and seldom result in improved performance.

Does it drop rows when you use only one partition? Do you have partitioned reads turned off? What's your Isolation Level? Autocommit? Fail on Row Error? Does it seem to magically drop rows after a perfectly round number, like 2000? (might indicate that you have waves set up, and it's only processing the first one).

I haven't used a DB2 connector in about three years, but like most connectors, it has it's quirks, and you need to have somebody that knows them to get it configured properly.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours