DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
Developer9



Group memberships:
Premium Members, Heartland Usergroup

Joined: 14 Apr 2011
Posts: 174

Points: 2437

Post Posted: Tue Jun 19, 2018 10:44 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Windows
Hi,
I am debugging one of the job running in single node (DS 11.5 / windows server) that is taking "about 5 minutes of production run time " .my goal is to identify bottlenecks and performance tuning of job to make sure it runs efficiently (lesser time).

Job design contains reading oracle source data (connector stages-record range 0.1M to 4M), datasets (record range 500 to 100000) being used as reference data for join,lookup operations also several other stages sort ,copy,aggregator ,funnel and transformer stages ..target its writing to 2 data sets .( overall 50 and above stages )



Transformer LOGIC

Code:
**timestamp to varchar conversion ***
If IsNull(DATE_CREATED) Then '' Else If IsValidTimestamp(DATE_CREATED)  Then DateToString(TimestampToDate(DATE_CREATED),"%mm/%dd/%yyyy")
Else ''

**Decimal conversion **
DecimalToString(val,"fix_zero,suppress_zero")


Director job logs


10:16:34 AM Starting Job ..
$APT_CONFIG_FILE = C:\IBM\InformationServer\Server\Configurations\ 1Node.apt
$APT_DISABLE_COMBINATION = True
$APT_SORT_INSERTION_CHECK_ONLY = False
$APT_STRING_PADCHAR = 0x0 (From value file)
$APT_PM_PLAYER_TIMING = False (From value file)
$APT_PM_PLAYER_MEMORY = True (From value file)
$APT_RECORD_COUNTS = True (From value file)
$OSH_PRINT_SCHEMAS = True (From value file)
$APT_DUMP_SCORE = True (From value file)
$APT_STARTUP_STATUS = False (User supplied)

10:16:34 AM Attached Message Handlers:
10:16:37 AM Project message handler: SortStageResort
10:16:37 AM Environment variable settings:
...
Parallel job initiated
OSH script ..
# OSH / orchestrate script for Job
main_program: orchgeneral: loaded
10:16:39 AM orchsort: loaded
10:16:39 AM orchstats: loaded
**oracle connector extract ** (elapsed time =14 sec)
10:16:53 AM ext_CONTACTS_plus: The connector connected to Oracle server oradb1.

**dataset folder**
10:17:54 AM is/isdata/tgt1_ds,0: Heap growth during runLocally(): 86024192 bytes

***Transformer stage Messages (elapsed time =9 sec)
10:18:47 AM xfm_standardize,0 :Input 0 consumed 520749 records.
10:18:47 AM xfm_standardize,0: Output 0 produced 520749 records.
10:18:56 AM xfm_standardize,0: Heap growth during runLocally(): 85639168 bytes

***LOOKUP stage (elapsed time =4 sec)
10:19:06 AM lkp_ob_co_bond_obi,0: Heap growth during runLocally(): 133578752 bytes
10:19:10 AM buffer(1),0: Heap growth during runLocally(): 85590016 bytes
10:19:10 AM buffer(0),0: Heap growth during runLocally(): 85708800 bytes

***funnel stage
10:19:10 AM funl_inv_fiv,0: Heap growth during runLocally(): 85143552 bytes

***sort
10:19:11 AM srt_bond,0: Heap growth during runLocally(): 85983232 bytes
10:19:12 AMbuffer(10),0: Heap growth during runLocally(): 83468288 bytes
**oracle connector stage (elapsed time =22 sec)
10:19:12 AM ext_plus,0: Number of rows fetched on the current node: 1151731.
10:19:12 AM ext_plus,0: Output 0 produced 1151731 records.
10:19:34 AM ext_plus,0: Heap growth during runLocally(): 82477056 bytes
***copy stage ** (elapsed time =1 minute ,10 sec)

10:19:38 AM cpy_stage,0: Output 0 produced 1151731 records.
10:19:38 AM cpy_stage,0: Input 0 consumed 1151731 records.
10:20:48 AM cpy_stage,0: Input 0 consumed 1151731 records.

***join stage ** (Let outer join --Input links -Auto partition ) (elapsed time =1 minute ,4 sec)

10:21:33 AM Jnr_Bond_id,0: Input 0 consumed 1151731 records.
10:21:33 AM Jnr_Bond_id,0: Output 0 produced 1151731 records.
10:22:45AM Jnr_Bond_id,0: Heap growth during runLocally(): 80654336 bytes
10:22:47 AM buffer(23),0: Heap growth during runLocally(): 80687104 bytes
10:22:47 AM buffer(23),0: Heap growth during runLocally(): 80687104 bytes

***dataset ** (partition/collection --Auto )(elapsed time =6 sec)

10:22:49 AM ds_tgt1,0: Input 0 consumed 1207442 records.
10:22:49 AM ds_tgt1,0: Input 0 produced 1207442 records.
10:22:55 AM ds_tgt1,0: Heap growth during runLocally(): 79831040 bytes

10:22:55 AM main_program: Step execution finished with status = OK.
10:22:58 AM main_program: Startup time, 1:18; production run time, 4:57.
10:22:58 AM Parallel job reports successful completion
10:22:59 AM Finished Job .

Redesign :Split the original job into 1 jobs.

job1 :extract oracle data and write to 2 datasets

job2 :use 2 datasets as sources for original job ..it does n't help either rather it becomes worse.

main_program: Startup time, 0:37; production run time, 13:42.

Please give me some insights to performance tune this job.

Thanks.
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42796
Location: Denver, CO
Points: 220596

Post Posted: Tue Jun 19, 2018 2:11 pm Reply with quote    Back to top    

So... let me see... you have a Parallel job with 50+ stages that processes 1 to 4 million records on a Windows server in around 5 minutes and you think the performance needs to be improved? Okay. Might be a bit difficult to whittle that down. Have you tried simply increasing the number of nodes, say to 2 as an experiment? Obviously you'd need to check your partitioning strategy if you haven't already.

I'd also be curious why the performance concern. Is this something that will be running many times over the course of the day, for example?

_________________
-craig

"I don't mind you comin' here and wastin' all my time time"
Rate this response:  
Not yet rated
Developer9



Group memberships:
Premium Members, Heartland Usergroup

Joined: 14 Apr 2011
Posts: 174

Points: 2437

Post Posted: Wed Jun 20, 2018 4:08 pm Reply with quote    Back to top    

@Chulett,

For the first question regarding increasing number of nodes to 2 , I attempted a test run

Code:
main_program: Startup time, 1:38; production run time, 8:42.

And regarding partitioning Its a hash partition on sort stages after the oracle connector extract then down in the flow I try maintain the same with "same" partitioning every where including input to the join/lookup stages.What does heap growth insist generally ?

currently these jobs run (extract and generate delta files using change data capture ) at every 2 hour interval but the business need is to run at every 1 hour interval if possible , for that I am tuning the jobs for better timing and resource utilization stand point.

Thanks .
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42796
Location: Denver, CO
Points: 220596

Post Posted: Wed Jun 20, 2018 9:46 pm Reply with quote    Back to top    

Sorry, still a bit stuck on this... it needs to run every hour and you're worried about five minute runtime? What was the worst runtime you've seen so far? I'm assuming it is being tracked.

Was worried you actually needed to run it every five minutes. Got tasked with that back in the day: take a nightly task that churned out a ton of XML in a couple of hours and cut it back so it can process "micro-batches" every five minutes. Shocked Managed to do it, though. Added a safety net that the looping Sequence job that controlled it could leverage. Track the previous runtime and if we floated past 5 minutes (gawd forbid it ran 6 minutes) it would always delay the loop until the next 5 minute clock time. Worked surprisingly well. Of course, these were all Server jobs without all of the crazy ramp up and cool down times that PX can be saddled with, but hey - made it work. <thumbsup>

Been far too long since I had any hands-on DataStage experience, so I'll let others give any specific partitioning advice they are willing to share. Unclear what the "heap growth" question is all about or how to answer it so will leave that to others as well. Wink

_________________
-craig

"I don't mind you comin' here and wastin' all my time time"
Rate this response:  
Not yet rated
qt_ky



Group memberships:
Premium Members

Joined: 03 Aug 2011
Posts: 2822
Location: USA
Points: 21387

Post Posted: Thu Jun 21, 2018 5:09 am Reply with quote    Back to top    

If I had a 60 minute window and the job was taking only 5 to 10 minutes, then I may take a quick glance at tuning then quickly move on.

What jumps out to me is that t he job startup times look pretty bad. What in the world is going on there? Have you got some slow before-SQL statements in your database stages? Our startup times are consistently 2 to 4 seconds even on jobs with 30+ stages. There were some known issues about long startup times so it is possible you are on a version with such an issue and may need a patch. Open a support case to get help to debug it. Please let us know what you find out.

Another thing to question is the overall job design, with whatever it is doing, is it reprocessing the same records and pulling in full volumes of data with each and every run, or could it be redesigned more efficiently at every step to process only the records that are new or updated or deleted.

_________________
Choose a job you love, and you will never have to work a day in your life. - Confucius
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours