JDBC Connector Performance Tuning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

JDBC Connector Performance Tuning

Post by satheesh_color »

Hi All,

We are using JDBC Connector to connect to the remote Database using oracle thin client using Type 4 connection.

When we try to extract the data from the Database....the fetch or rows/sec is very low(800-1000 rows/sec).

Properties:
FetchSize:2000
HeapSize:256

Kindly let me know your thoughts to increase the performance of the data extraction.

Thanks,
Satheesh.R
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need to determine the bottleneck. Rather than a magic setting in the connector it could be the source database, complexity of the source SQL, network issues or just how long it takes to write to the target. Eliminate each one by one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Also, are you doing a partitioned read?

Rows per second is misleading if you do not indicate what your row width is. 800 rows per second when you have 100 bytes per row is terrible. But if you have 100K bytes per row then 800 is pretty darn good.
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi paul,

Can we use partition read for non-partitioned tables.When I try to read data on partitioned read, the count shows a huge variation and it fetches much faster 25k rows/sec. But..i am really worried about the huge count mismatch.



Kindly let me know your thoughts on the same.


Regards,
Satheesh.R
leandrohmvieira
Participant
Posts: 44
Joined: Wed Sep 02, 2015 7:19 am
Location: Brasilia, Brazil

Post by leandrohmvieira »

Hello Satheesh, you can use partition reads on non-partitioned tables for sure, there are partions methods like round-robin which divides the table evenly.

You just need to be aware of your ETL design, some stages need specific partition types to work, so if you have a complex ETL design, you data may need be repartitioned, leading to new bottlenecks.

But if its a simple source->target design, it will be fine. Just make a validation/count on target based on some key after the operation to check.
Leandro Vieira

Data Expert - Brasilia, Brazil
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi,

We were using simple extraction jobs from JDBC to a Dataset(Using Hash Partitioning) then it will be used in another jobs for join/lookup. Still i could see the count mismatch(huge difference) when i try to run the job with Partitioned read=YES.


Appreciate for your help!!!



Regards,
Satheesh.R
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

What is the count mismatch you are talking about? Are you talking about the end result in the quantity of rows obtained? Or simply the rate at which you are extracting from Oracle?
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi Paul,

Count mismatch is i could see the huge difference in extracting as well as Dataset rows.

Eg: My source table has the count of 1000, when i try to run with 4 nodes with partition read enabled...the extraction count exceeds more than 5000 rows.


Regards,
Satheesh.R
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Check with your support provider to determine if there are any known issues with partitioned read in the JDBC Connector.

Add $CC_MSG_LVL to your job parameters and set it to 1, 2, or 3 to enable some connector tracing to be recorded in the job log. I usually start with a setting of 2. You may have to add it as a user defined project environment variable if you have not done so previously (leave the default empty so as not to generate excessive logging across your entire project).

In the job log you will see the actual query that was passed to the database. A partitioned read usually injects some additional logic in the WHERE clause. If you have anything more complex than a single table SELECT, the partitioned read may not work properly. You can determine that by evaluating the generated query.

Mike
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Start to debug in a phased approach.

chop your job up into pieces.

Oracle Connector -> dataset
Oracle Connector -> sequential file

With 1000 rows in your source table, I would have expected maybe 4000 rows on your output if your partitioning was entire, but ... 5000ish?!? that is odd.

When faced with an issue, I like to break it down into it's smallest components. That way you remove "noise" from the equation. If it's your Oracle read that you suspect is wrong, then just do a read in your job, no other transforms, etc...
Post Reply