Select rows from table and use results as parameter to job

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
bb98
Participant
Posts: 4
Joined: Sun Apr 02, 2017 7:54 am

Select rows from table and use results as parameter to job

Post by bb98 »

I have created a parallel job that retrieves a list of employee numbers from a table. The resulting rows are passed to a transformer stage. The TSF stage generates additional rows for each employee based on a date range. The TSF also adds information needed to call a web service to each row. The output (employee number, date, web info) is then passed to a Web Service Tranformer - a web service is called and the result is passed to an xml parser and the results processed and written to various tables. The downstream processing includes sorting stages, removal of duplicates etc. I think the performance throughput is degraded because these stages are waiting on the results from the parser stage.
My thought is to move the retrieval of the employees and the looping to create rows for each employee/date combination to a job sequence and then pass one employee/date row to the parallel job at a time. I think this will allow the parse stage to finish quicker and pass a smaller result set down stream for processing.
My problem is I do not know how to read the employee table in a job sequence and create a list/array to process in a loop (I have worked out looping through the date range). I think I could probably use a parallel job to read the employee table and write the results to a file and go from there - but, I want need to be able to process multiple "tenants", at one time. I think I need to be able to schedule the job with parameters for tenant 1, another schedule for tenant 2 etc. Therefore, I am concerned about having multiple instances of the job writing to a file.
Thank you in advance for your help, design guidance etc.
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

If I understand correctly you want to run a job once per employee/date combination and pass the results on to the next process. I don't know if that will be quicker as you hope, but here's how I have done this sort of thing in the past:


Output the list of employess to a flat file.

Turn the flat file in to a space or comma separate list.

(Or create the list directly in the job which extracts the employees from the table).

You can then use a loop construct in a sequence to process each "thing" (employee) in the list passing the employee to the job(s) as a parameter.


Hope that helps
Bob Oxtoby
bb98
Participant
Posts: 4
Joined: Sun Apr 02, 2017 7:54 am

Post by bb98 »

Thank you for your response.

I did manage to create a job sequence that called a parallel job to output the list of employees to a sequential file. The job sequence then read the file content and looped through both the file list and a date range calling the parallel job that then executes a web service and processes the results. What a disaster! It worked but, lesson learned, the startup time for each call to the parallel job resulted in worse performance.

Next, I abandoned that approach and tried using a wave generator stage in the original parallel job. After 2 days of troubleshooting including several hours on the phone with IBM I have concluded it is not a good idea to put a wave generator in front of a hierarchical stage if that stage is going to generate multiple output links and those links have multiple rows. My guess is the end of wave marker is not processed as expected and cuts off part of the output from the hierarchical stage. I found that with a record count of 1 in the wave generator the job would fail with error messages like - cannot insert null value in table. If I set the record count to 1,000,000 then the job would again run successfully. That is what makes me think the data was getting cutoff inappropriately. But of course a high record count like that defeats the purpose of the wave generator. Each soap response sent to the hierarchical stage represents a unit of work and can be processed downstream (which includes sort stages) by itself.

At times, I need to run this job for large date ranges - a year or more. So I was trying to get the job to process smaller units of work. If the wave generator had done what I expected it to do then I think I would have been ok. For now, I am going back to the job sequence and loop approach but I am going to loop through dates instead of both employees and dates. I think I should be able to setup the loop, have the step driven by a parameter and then be able to loop through the date range one day at a time or multiple days at a time depending upon what value I set for the loop step. The parallel job will have to process all of the employees in the database table - but at least I can throttle that back to a number of days as needed.

Sorry for the ramble, but if you see a problem with my approach or if there is a better approach I would appreciate your input.

As for the original topic of this post - selecting rows from a table and using the results as parameters to a job, I think you have confirmed my lack of results from searching. That is, in a job sequence, it is not possible to read values from a database table, put those values into a list / array and then process the list. At least not without writing a custom routine or a shell script etc. I am slowly coming to accept that with job sequences, the best way to process records like this is to call a job to create a flat file, then process the flat file.

Thanks again for your response.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For a simple task like that, use a Server job. It will be up, had breakfast and be done with all of its work before the parallel job gets its pants on. :wink:

And please don't tell me you're one of those sites where they are "not allowed". :roll:
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

Ibm's training class instructors flat out told us to not use server jobs as they were there for porting old code only, never for new work. Things like this get heard by people in charge who don't actually code and then you have a policy...

We were avoiding them for a while, but someone finally was able to get through the red tape and overcome it.
Thomas.B
Participant
Posts: 63
Joined: Thu Apr 09, 2015 6:40 am
Location: France - Nantes

Post by Thomas.B »

bb98 wrote:As for the original topic of this post - selecting rows from a table and using the results as parameters to a job, I think you have confirmed my lack of results from searching. That is, in a job sequence, it is not possible to read values from a database table, put those values into a list / array and then process the list. At least not without writing a custom routine or a shell script etc. I am slowly coming to accept that with job sequences, the best way to process records like this is to call a job to create a flat file, then process the flat file.
You can send the result of an SQL query to a job parameter, to do it you have to call a server job on your sequence job.
The server job will execute the SQL query and send its results to a transformer. The transformer can send back the string to the sequence job using the SetJobUserStatus() function.

You can now use the #YourServerJob.$UserStatus# parameter in the sequence job to send the SQL query result to a job parameter.
BI Consultant
DSXConsult
bb98
Participant
Posts: 4
Joined: Sun Apr 02, 2017 7:54 am

Post by bb98 »

Thanks for the feedback. I will research the server job approach. And under the assumption the advice is solid , consider the issue resolved. :)
Post Reply