Throttle a DS Parallel job which invokes a Rest Service

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tanumoy2017
Participant
Posts: 6
Joined: Tue Oct 10, 2017 12:32 pm

Throttle a DS Parallel job which invokes a Rest Service

Post by tanumoy2017 »

Hi Team

In our project we have to requirement as stated below. Please share your views on how to achieve the required behavior.

We are reading data from an Oracle Source and forming the request payload in a json format using the hierarchical stage json composer and keeping in a oracle table. Then from this request table we are again using the hierarchical stage to call a rest end point using rest service and getting the response back and storing it back into another response oracle table. Our project is in non-grid with 1 node so only 1 request is fired at a time only.

Now the requirement is to control the no of request to the rest end point per minute since the underlying java application can process upto 100 request per minute.

So, how do I control the throttling speed to call the rest service.

Thanks
Tanumoy
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I think you also need to come to an understanding with your java source folks. It's not 100 per minute, it's X amount of concurrent requests.

100 requests spread across 60 seconds is different than 100 requests done within the first second of that minute, then 60 seconds later 100 more instantaneous requests.

If you really want to single thread your workload, then you could use the rest api to put your unit of work upon a queue. Then job #2 loops and processes those requests individually and in fifo order one at a time.

The act of spinning up your datastage job will be enough to satisfy your 100 per second limit.

If your guys are telling you 100max per min, add some elbow room (if not already done) and shoot for lower.


It's hard for us to recommend a course of action since there isn't much to go upon here.
tanumoy2017
Participant
Posts: 6
Joined: Tue Oct 10, 2017 12:32 pm

Post by tanumoy2017 »

Hi @PaulVL

Thanks for your reply.

So from our rest service end it is performance tested to take only 100 transaction per minute and we do not want to process concurrent request since we can update the same customer data coming from the source for different attributes which touches some core common underlying tables and we do not want to cause deadlock in the DB.
So process wise we can go up-to 100 request per minute and not more than that in a single node so that we process sequentially. Also it is fine if all these 100 request completes in less than 1 minute, but the next set of 100 should only start firing after the previous 1 minute is completed.

I think we do not need a queue and looping since we are running only in single node, the request and response is only 1 at a time.

What we need is :

1. How will we control this throttling speed of 100 transaction per minute in the hierarchical stage?

Note : We are thinking of an option that we will read 100 records and call the rest API and then calculate the time taken for the last 100 response and check whether it has completed 1 minute and if not sleep for the remaining time and loop again to read 100. Let us know if this is the only solution available or do you think of any better solution.

Thanks
Tanumoy
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Well normally we are asked to speed up processes..but it could be done. Here is one way

To the request table add a column to flag the requests records as processed/not processed. Default the value to "N" on insert.

I would guess that you have a DS sequence...so After the DS that populate the request table do the following:
- Add a new DS job to create a list of request ids that should be processed( all records with flag "N". You could dump the request ids into a flat file.
- Next cat the file that contain the list of request
- Add a loop to the DS sequence and configure the loop to use the values from the file previously cat'ed.
Inside the loop, execute the existing DS job that read the request table and is using the hierarchical stage to call a rest end point using rest service and getting the response back and storing it back into another response oracle table.
You would need to add a parameter "RequestId" to this job and also need to change the extraction SQL to fetch only the request passed from the loop value. This job will eexecute one time for each request

Update the flag to "Y" in the request table, in previous job or in a new job

- Add a command activity after the previous job, use the sleep UNIX command with zero seconds, play with this value until you get the desired "100" request per minute. You might not need the sleep command...

- close the loop

If the process is too slow then adjust the strategy to pass a range of request ids instead of a single id to the job
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
Post Reply