DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
vickymsw02
Participant



Joined: 18 Apr 2018
Posts: 5

Points: 49

Post Posted: Wed Apr 18, 2018 1:54 pm Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Hello Everyone,

I have a requirement to run and complete a Parallel job based on a value in an Oracle table. Need your valuable suggestions how can this be accomplished.

Oracle table (table1) contains a Date attribute. Date in table1 is updated with System Date every day once anytime between 12 AM and 11 PM. The DataStage job should check this table every 5 minutes and see whether the table has been updated with the system date. If the table is updated, the DataStage job should be completed writing the value in the table1 to a file.

-Vicky
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42762
Location: Denver, CO
Points: 220350

Post Posted: Wed Apr 18, 2018 2:35 pm Reply with quote    Back to top    

It could be just a matter of scheduling the job to run every five minutes looking for a new record in the source table. If it finds it, it runs to completion and does something to mark the triggering record as 'processed'. When it doesn't find a new record, it simply processes nothing.

Another option is to use a Sequence job with a Start/End Loop construct in it that ran the job, waited five minutes (or to the next five minute mark) via a custom routine and then loops around again.

Lastly, you could also loop / schedule a light-weight polling job that all it does is look for the triggering record in the table. Based on its success, you could trigger (or not trigger) the actual processing job. I've done this when I only want to crank up a 'heavyweight' processing job if there's something to process. This might also be something to consider when your target is a flat file as you've mentioned... since constantly running a job with a file as its target will constantly create that file even when zero records move through the job.

_________________
-craig

Research shows that 6 out of 7 dwarves aren't happy
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Wed Apr 18, 2018 2:43 pm Reply with quote    Back to top    

Craig has good suggestions. I wonder about whether your jobs run in isolation, or if you have a scheduling application like Control-M to manage the jobs. If, for example, there was a scheduled job which updates the system date, your DataStage job would just wait for it to run and complete the update. It just seems odd that you have to run a job 12 times per hour for up to 23 hours.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
vickymsw02
Participant



Joined: 18 Apr 2018
Posts: 5

Points: 49

Post Posted: Wed Apr 18, 2018 3:01 pm Reply with quote    Back to top    

Thank Craig and Franklin. Yes, we use a third party scheduler. So can I go ahead with the below approach?


-->Extract date from the table.
-->Start loop
-->Check if the date in the table is equal to system date.
-->If Yes, go ahead and write this to a file.
-->Else wait for 5 minutes (How do I achieve this in the loop and without using routines. Can I use a wait for a file activity for this?
-->Exit loop once date matches with sysdate.

Please suggest me a different approach with steps if you in the mind.

-Vicky
Rate this response:  
Not yet rated
eostic

Premium Poster



Group memberships:
Premium Members

Joined: 17 Oct 2005
Posts: 3782

Points: 30373

Post Posted: Thu Apr 19, 2018 4:08 am Reply with quote    Back to top    

It does seem confusing for you to need to poll for the change? You noted that this date changes once per day.... what initiates that change? A scheduled system program? A transaction? A web ser ...

_________________
Ernie Ostic

blogit!
Open IGC is Here!
Rate this response:  
Not yet rated
ArndW

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup

Joined: 16 Nov 2004
Posts: 16318
Location: Germany
Points: 92566

Post Posted: Thu Apr 19, 2018 5:35 am Reply with quote    Back to top    

Since it is complicated to add a loop-and-sleep mechanism in a parallel Job, you might want to do this whole process in a only a Job sequence. As above, use the Loop functionality and use an external call to issue the Oracle connect-and-query commands and parse the Response to compare it with the current date.

This wouldn't consume as many resources as writing a parallel Job that gets started frequently.

Another Approach is not to use DataStage. If you add a Trigger to your table to call the Oracle UTL_FILE functionality you can automagically create a file (on the Oracle Server) when the record changes. No polling required.

_________________

Image
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42762
Location: Denver, CO
Points: 220350

Post Posted: Thu Apr 19, 2018 6:57 am Reply with quote    Back to top    

There's a lot of concern here with running a Parallel job over and over and over again during the course of the day. Can you clarify something - does this job do more than simply write a value from the table to a file? Or does the job do actual processing work AND you need this file value written?

And as noted, none of these suggestions about looping or polling were meant for use inside a parallel job, that's just not in the nature of the beast. What you'd have would be a looping mechanism - be it a Sequence job or script or what have you - that would run the job at a regular interval. Ideally only when it was time to do the work, something much more light-weight could do the actual polling / checing. Perhaps even your third party scheduler could handle that aspect of it, depending on its capabilities.

For the "wait" question, you really should use a routine for the Sequence job approach. They are simple to build and are really something you should have in your toolbox as a DataStage developer. IMHO. However, yes, you could use a WFF stage for this. Only thing to keep in mind is with no actual file involved, when you tell it to only wait for five minutes and the time expires, the stage will take the "failure" route when it moves forward. You'll have to explicitly handle that so the sequence knows it's okay rather than having it abort the entire sequence. My routine calculated how many seconds it would need to wait in order to sleep until the next fifteen minute mark, not fifteen minutes. This way the job ran on every quarter hour rather than its start time slowly creeping around the dial, which would have driven me crazy. But maybe that was just me. Wink

_________________
-craig

Research shows that 6 out of 7 dwarves aren't happy
Rate this response:  
Not yet rated
vickymsw02
Participant



Joined: 18 Apr 2018
Posts: 5

Points: 49

Post Posted: Thu Apr 19, 2018 9:23 am Reply with quote    Back to top    

Let me make this more clear. In the current scenario in scheduler, we have 2 batches dependent on each other. Batch 1 is a mainframe batch that updates source tables with updated values which exceeds more than 24 hours. Batch2 (A DataStage sequence) is dependent on batch1 execution completion since it does a look up of these tables updated by batch. Now since batch2 cannot wait for 24 hours, mainframe source team has agreed that once the necessary tables required by batch2 are updated, they will update a table with the system date so that batch can execute. So in the scheduler, we are planning to introduce a new dependency for batch2 say this batch be batch_new. Below question that I posted is how this can be built. So the batch_new will look for the source table every 5 minutes and if sysdate is found, it can go ahead and complete the job which triggers the batch2. In this way we will save a lot of time. Hope it clarifies.

Also dear Gurus, I find it a bit difficult to understand the solution that you have provided. If you don't mind, could you please simplify the steps on how I can achieve this.

-Vicky
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42762
Location: Denver, CO
Points: 220350

Post Posted: Thu Apr 19, 2018 9:30 am Reply with quote    Back to top    

Thanks. Clarity is always appreciated. Wink

What scheduler are you using, anything in the "enterprise" class?

_________________
-craig

Research shows that 6 out of 7 dwarves aren't happy
Rate this response:  
Not yet rated
vickymsw02
Participant



Joined: 18 Apr 2018
Posts: 5

Points: 49

Post Posted: Thu Apr 19, 2018 10:05 am Reply with quote    Back to top    

Its ESP Scheduler
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Thu Apr 19, 2018 10:06 am Reply with quote    Back to top    

Vicky,

A hypothetical job flow:

Job1 -- Query for update to system date in db table. Loop conditions: if not found, wait five minutes and try again. If found, end and release next job to run.

Job2 -- Extract system date for further processing.

It looks simple, because the complexity is under the covers. You can use a variety of methods to query for the date, including a DataStage job sequence and parallel job, though a script is the most efficient approach for that. The caution I imply here is that you want the query for the date and only that query to be subject to the repeating five-minute cycle.

I asked about your scheduling method, because with assistance from your development group you could possibly eliminate the five-minute cycle. If a job in "Batch 1" is responsible for updating the system date, and that job is clearly identified in your scheduler, you would make it the predecessor job to your DataStage job that uses the system date. It could be a one-to-one, one-time per cycle job flow.

For example, I have jobs that are dependent on the arrival of files from external (to us) sources. We use a file-watcher -- a job that runs a Unix command for the file(s) in question -- and either prompts for a waiting period before running again (file not found) or ends the job and releases the job(s) that use the file(s). The cycling and wait times are all in the scheduler. The main processing job only ever runs once per cycle.

I'm glad you clarified your dependencies. Date-dependent processing demands that cycle complete in less than 24-hours. Since Batch 1 exceeds 24 hours, you have a bad situation for which coding is unlikely to show you a happy path. You have good reasons to be frustrated.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
vickymsw02
Participant



Joined: 18 Apr 2018
Posts: 5

Points: 49

Post Posted: Thu Apr 19, 2018 1:23 pm Reply with quote    Back to top    

Thank you, but how do I do a 5 minute check using the looping concept? Batch1 doesn't update the table with sysdate, its done by Mainframe developers manually. I don't have a file to use the wait for file activity
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Thu Apr 19, 2018 1:33 pm Reply with quote    Back to top    

Vicky, all I can say is that you have Mainframitis, and it might be incurable.

First, manual updates to production being rather, ahem, unusual, they should at least apply some discipline and do the update around the same time every day. That doesn't make your job easier, but it does give you something consistent.

I suggest two jobs in ESP. The first one runs a routine -- script, DS job, whatever you determine is best -- cyclically for the sysdate query. When the date update is detected, it stops and releases the main job from ESP that runs what you need for processing based on the date.

Design details are up to you. If you must design things to run independently of ESP, previous suggestions will help. Use job sequence to set up the loop, and put the main processing job in the sequence after the loop completes with the sysdate you need.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
FranklinE



Group memberships:
Premium Members

Joined: 25 Nov 2008
Posts: 692
Location: Malvern, PA
Points: 6561

Post Posted: Fri Jun 08, 2018 12:24 pm Reply with quote    Back to top    

I have a further thought around the manual update part. I suggest imposing a technical requirement on the mainframe developers that they accomplish the update with an ESP job. This would eliminate the need for a cyclic approach. You make the DataStage job(s) dependent on the sysdate update job. It/They run after the mainframe guys run their job.

_________________
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: http://www.dsxchange.com/viewtopic.php?t=143596 Using CFF FAQ: http://www.dsxchange.com/viewtopic.php?t=157872
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours