Run and complete a DataStage Job based on a value in a table

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
vickymsw02
Participant
Posts: 5
Joined: Wed Apr 18, 2018 1:24 pm

Run and complete a DataStage Job based on a value in a table

Post by vickymsw02 »

Hello Everyone,

I have a requirement to run and complete a Parallel job based on a value in an Oracle table. Need your valuable suggestions how can this be accomplished.

Oracle table (table1) contains a Date attribute. Date in table1 is updated with System Date every day once anytime between 12 AM and 11 PM. The DataStage job should check this table every 5 minutes and see whether the table has been updated with the system date. If the table is updated, the DataStage job should be completed writing the value in the table1 to a file.

-Vicky
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It could be just a matter of scheduling the job to run every five minutes looking for a new record in the source table. If it finds it, it runs to completion and does something to mark the triggering record as 'processed'. When it doesn't find a new record, it simply processes nothing.

Another option is to use a Sequence job with a Start/End Loop construct in it that ran the job, waited five minutes (or to the next five minute mark) via a custom routine and then loops around again.

Lastly, you could also loop / schedule a light-weight polling job that all it does is look for the triggering record in the table. Based on its success, you could trigger (or not trigger) the actual processing job. I've done this when I only want to crank up a 'heavyweight' processing job if there's something to process. This might also be something to consider when your target is a flat file as you've mentioned... since constantly running a job with a file as its target will constantly create that file even when zero records move through the job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Craig has good suggestions. I wonder about whether your jobs run in isolation, or if you have a scheduling application like Control-M to manage the jobs. If, for example, there was a scheduled job which updates the system date, your DataStage job would just wait for it to run and complete the update. It just seems odd that you have to run a job 12 times per hour for up to 23 hours.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
vickymsw02
Participant
Posts: 5
Joined: Wed Apr 18, 2018 1:24 pm

Post by vickymsw02 »

Thank Craig and Franklin. Yes, we use a third party scheduler. So can I go ahead with the below approach?


-->Extract date from the table.
-->Start loop
-->Check if the date in the table is equal to system date.
-->If Yes, go ahead and write this to a file.
-->Else wait for 5 minutes (How do I achieve this in the loop and without using routines. Can I use a wait for a file activity for this?
-->Exit loop once date matches with sysdate.

Please suggest me a different approach with steps if you in the mind.

-Vicky
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

It does seem confusing for you to need to poll for the change? You noted that this date changes once per day.... what initiates that change? A scheduled system program? A transaction? A web service? Determine that that "thing" is..... ...because there are many effective ways to launch a DS Job "from somewhere else"....whether that is a scheduling tool, or a script, or a web services request, etc.

If this "date" changes itself every few minutes, and you want to capture those as quickly as possible, then polling can be ok....but if it only changes that one time, then polling is a bit of a waste --- you will run the Job far more frequently than it will ever perform the expected work.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Since it is complicated to add a loop-and-sleep mechanism in a parallel Job, you might want to do this whole process in a only a Job sequence. As above, use the Loop functionality and use an external call to issue the Oracle connect-and-query commands and parse the Response to compare it with the current date.

This wouldn't consume as many resources as writing a parallel Job that gets started frequently.

Another Approach is not to use DataStage. If you add a Trigger to your table to call the Oracle UTL_FILE functionality you can automagically create a file (on the Oracle Server) when the record changes. No polling required.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's a lot of concern here with running a Parallel job over and over and over again during the course of the day. Can you clarify something - does this job do more than simply write a value from the table to a file? Or does the job do actual processing work AND you need this file value written?

And as noted, none of these suggestions about looping or polling were meant for use inside a parallel job, that's just not in the nature of the beast. What you'd have would be a looping mechanism - be it a Sequence job or script or what have you - that would run the job at a regular interval. Ideally only when it was time to do the work, something much more light-weight could do the actual polling / checing. Perhaps even your third party scheduler could handle that aspect of it, depending on its capabilities.

For the "wait" question, you really should use a routine for the Sequence job approach. They are simple to build and are really something you should have in your toolbox as a DataStage developer. IMHO. However, yes, you could use a WFF stage for this. Only thing to keep in mind is with no actual file involved, when you tell it to only wait for five minutes and the time expires, the stage will take the "failure" route when it moves forward. You'll have to explicitly handle that so the sequence knows it's okay rather than having it abort the entire sequence. My routine calculated how many seconds it would need to wait in order to sleep until the next fifteen minute mark, not fifteen minutes. This way the job ran on every quarter hour rather than its start time slowly creeping around the dial, which would have driven me crazy. But maybe that was just me. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
vickymsw02
Participant
Posts: 5
Joined: Wed Apr 18, 2018 1:24 pm

Post by vickymsw02 »

Let me make this more clear. In the current scenario in scheduler, we have 2 batches dependent on each other. Batch 1 is a mainframe batch that updates source tables with updated values which exceeds more than 24 hours. Batch2 (A DataStage sequence) is dependent on batch1 execution completion since it does a look up of these tables updated by batch. Now since batch2 cannot wait for 24 hours, mainframe source team has agreed that once the necessary tables required by batch2 are updated, they will update a table with the system date so that batch can execute. So in the scheduler, we are planning to introduce a new dependency for batch2 say this batch be batch_new. Below question that I posted is how this can be built. So the batch_new will look for the source table every 5 minutes and if sysdate is found, it can go ahead and complete the job which triggers the batch2. In this way we will save a lot of time. Hope it clarifies.

Also dear Gurus, I find it a bit difficult to understand the solution that you have provided. If you don't mind, could you please simplify the steps on how I can achieve this.

-Vicky
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks. Clarity is always appreciated. :wink:

What scheduler are you using, anything in the "enterprise" class?
-craig

"You can never have too many knives" -- Logan Nine Fingers
vickymsw02
Participant
Posts: 5
Joined: Wed Apr 18, 2018 1:24 pm

Post by vickymsw02 »

Its ESP Scheduler
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Vicky,

A hypothetical job flow:

Job1 -- Query for update to system date in db table. Loop conditions: if not found, wait five minutes and try again. If found, end and release next job to run.

Job2 -- Extract system date for further processing.

It looks simple, because the complexity is under the covers. You can use a variety of methods to query for the date, including a DataStage job sequence and parallel job, though a script is the most efficient approach for that. The caution I imply here is that you want the query for the date and only that query to be subject to the repeating five-minute cycle.

I asked about your scheduling method, because with assistance from your development group you could possibly eliminate the five-minute cycle. If a job in "Batch 1" is responsible for updating the system date, and that job is clearly identified in your scheduler, you would make it the predecessor job to your DataStage job that uses the system date. It could be a one-to-one, one-time per cycle job flow.

For example, I have jobs that are dependent on the arrival of files from external (to us) sources. We use a file-watcher -- a job that runs a Unix command for the file(s) in question -- and either prompts for a waiting period before running again (file not found) or ends the job and releases the job(s) that use the file(s). The cycling and wait times are all in the scheduler. The main processing job only ever runs once per cycle.

I'm glad you clarified your dependencies. Date-dependent processing demands that cycle complete in less than 24-hours. Since Batch 1 exceeds 24 hours, you have a bad situation for which coding is unlikely to show you a happy path. You have good reasons to be frustrated.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
vickymsw02
Participant
Posts: 5
Joined: Wed Apr 18, 2018 1:24 pm

Post by vickymsw02 »

Thank you, but how do I do a 5 minute check using the looping concept? Batch1 doesn't update the table with sysdate, its done by Mainframe developers manually. I don't have a file to use the wait for file activity
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Vicky, all I can say is that you have Mainframitis, and it might be incurable.

First, manual updates to production being rather, ahem, unusual, they should at least apply some discipline and do the update around the same time every day. That doesn't make your job easier, but it does give you something consistent.

I suggest two jobs in ESP. The first one runs a routine -- script, DS job, whatever you determine is best -- cyclically for the sysdate query. When the date update is detected, it stops and releases the main job from ESP that runs what you need for processing based on the date.

Design details are up to you. If you must design things to run independently of ESP, previous suggestions will help. Use job sequence to set up the loop, and put the main processing job in the sequence after the loop completes with the sysdate you need.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I have a further thought around the manual update part. I suggest imposing a technical requirement on the mainframe developers that they accomplish the update with an ESP job. This would eliminate the need for a cyclic approach. You make the DataStage job(s) dependent on the sysdate update job. It/They run after the mainframe guys run their job.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply