DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
This topic has been marked "Resolved."
Author Message
bb98
Participant



Joined: 02 Apr 2017
Posts: 4

Points: 74

Post Posted: Tue Jan 02, 2018 10:45 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Server
OS: Windows
I have created a parallel job that retrieves a list of employee numbers from a table. The resulting rows are passed to a transformer stage. The TSF stage generates additional rows for each employee based on a date range. The TSF also adds information needed to call a web service to each row. The output (employee number, date, web info) is then passed to a Web Service Tranformer - a web service is called and the result is passed to an xml parser and the results processed and written to various tables. The downstream processing includes sorting stages, removal of duplicates etc. I think the performance throughput is degraded because these stages are waiting on the results from the parser stage.
My thought is to move the retrieval of the employees and the looping to create rows for each employee/date combination to a job sequence and then pass one employee/date row to the parallel job at a time. I think this will allow the parse stage to finish quicker and pass a smaller result set down stream for processing.
My problem is I do not know how to read the employee table in a job sequence and create a list/array to process in a loop (I have worked out looping through the date range). I think I could probably use a parallel job to read the employee table and write the results to a file and go from there - but, I want need to be able to process multiple "tenants", at one time. I think I need to be able to schedule the job with parameters for tenant 1, another schedule for tenant 2 etc. Therefore, I am concerned about having multiple instances of the job writing to a file.
Thank you in advance for your help, design guidance etc.
boxtoby



Group memberships:
Premium Members

Joined: 13 Mar 2006
Posts: 132
Location: UK
Points: 1375

Post Posted: Thu Jan 04, 2018 10:14 am Reply with quote    Back to top    

If I understand correctly you want to run a job once per employee/date combination and pass the results on to the next process. I don't know if that will be quicker as you hope, but here's how I have done this sort of thing in the past:


Output the list of employess to a flat file.

Turn the flat file in to a space or comma separate list.

(Or create the list directly in the job which extracts the employees from the table).

You can then use a loop construct in a sequence to process each "thing" (employee) in the list passing the employee to the job(s) as a parameter.


Hope that helps

_________________
Bob Oxtoby
Rate this response:  
Not yet rated
bb98
Participant



Joined: 02 Apr 2017
Posts: 4

Points: 74

Post Posted: Fri Jan 05, 2018 6:58 pm Reply with quote    Back to top    

Thank you for your response.

I did manage to create a job sequence that called a parallel job to output the list of employees to a sequential file. The job sequence then read the file content and looped through both the file list and a date range calling the parallel job that then executes a web service and processes the results. What a disaster! It worked but, lesson learned, the startup time for each call to the parallel job resulted in worse performance.

Next, I abandoned that approach and tried using a wave generator stage in the original parallel job. After 2 days of troubleshooting including several hours on the phone with IBM I have concluded it is not a good idea to put a wave generator in front of a hierarchical stage if that stage is going to generate multiple output links and those links have multiple rows. My guess is the end of wave marker is not processed as expected and cuts off part of the output from the hierarchical stage. I found that with a record count of 1 in the wave generator the job would fail with error messages like - cannot insert null value in table. If I set the record count to 1,000,000 then the job would again run successfully. That is what makes me think the data was getting cutoff inappropriately. But of course a high record count like that defeats the purpose of the wave generator. Each soap response sent to the hierarchical stage represents a unit of work and can be processed downstream (which includes sort stages) by itself.

At times, I need to run this job for large date ranges - a year or more. So I was trying to get the job to process smaller units of work. If the wave generator had done what I expected it to do then I think I would have been ok. For now, I am going back to the job sequence and loop approach but I am going to loop through dates instead of both employees and dates. I think I should be able to setup the loop, have the step driven by a parameter and then be able to loop through the date range one day at a time or multiple days at a time depending upon what value I set for the loop step. The parallel job will have to process all of the employees in the database table - but at least I can throttle that back to a number of days as needed.

Sorry for the ramble, but if you see a problem with my approach or if there is a better approach I would appreciate your input.

As for the original topic of this post - selecting rows from a table and using the results as parameters to a job, I think you have confirmed my lack of results from searching. That is, in a job sequence, it is not possible to read values from a database table, put those values into a list / array and then process the list. At least not without writing a custom routine or a shell script etc. I am slowly coming to accept that with job sequences, the best way to process records like this is to call a job to create a flat file, then process the flat file.

Thanks again for your response.
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42574
Location: Denver, CO
Points: 219079

Post Posted: Fri Jan 05, 2018 10:01 pm Reply with quote    Back to top    

For a simple task like that, use a Server job. It will be up, had breakfast and be done with all of its work before the parallel job gets its pants on. Wink

And please don't tell me you're one of those sites where they are "not allowed". Rolling Eyes

_________________
-craig

I know I don't say this enough, but I like when you talk to me. It's much better than when nobody talks to me. Or when people that I don't like will not stop talking to me.
Rate this response:  
Not yet rated
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 327

Points: 3307

Post Posted: Mon Jan 08, 2018 1:23 pm Reply with quote    Back to top    

Ibm's training class instructors flat out told us to not use server jobs as they were there for porting old code only, never for new work. Things like this get heard by people in charge who don't actually code and then you have a policy...

We were avoiding them for a while, but someone finally was able to get through the red tape and overcome it.
Rate this response:  
Not yet rated
Thomas.B
Participant



Joined: 09 Apr 2015
Posts: 61
Location: France - Nantes
Points: 404

Post Posted: Wed Jan 10, 2018 11:14 am Reply with quote    Back to top    

bb98 wrote:
As for the original topic of this post - selecting rows from a table and using the results as parameters to a job, I think you have confirmed my lack of results from searching. That is, in a job sequence, it is not possible to read values from a database table, put those values into a list / array and then process the list. At least not without writing a custom routine or a shell script etc. I am slowly coming to accept that with job sequences, the best way to process records like this is to call a job to create a flat file, then process the flat file.

You can send the result of an SQL query to a job parameter, to do it you have to call a server job on your sequence job.
The server job will execute the SQL query and send its results to a transformer. The transformer can send back the string to the sequence job using the SetJobUserStatus() function.

You can now use the #YourServerJob.$UserStatus# parameter in the sequence job to send the SQL query result to a job parameter.

_________________
BI Consultant
Business & Decision
Rate this response:  
Not yet rated
bb98
Participant



Joined: 02 Apr 2017
Posts: 4

Points: 74

Post Posted: Thu Jan 11, 2018 6:15 pm Reply with quote    Back to top    

Thanks for the feedback. I will research the server job approach. And under the assumption the advice is solid , consider the issue resolved. Smile
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours