Automated/semi-automated generation of Datastage jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
anamika
Participant
Posts: 16
Joined: Sat Feb 27, 2016 9:43 am
Location: Ottawa

Automated/semi-automated generation of Datastage jobs

Post by anamika »

Hello,
I have been experimenting with automated generation of DSX jobs using templates. I have also found a lot of information on Dsexchange and other forums for parsing DSX files, templates, scripts etc., Most of these scripts, descriptions generally discuss DSX job automation/generation using pre-generated templates.
I would like to take a step beyond that and would like to hear about automated job generation without using pre-generated templates. What types of experiences have people had while attempting to dynamically generate DSX files, issues, challenges, feasibility and other related items.
I have had some good success in being able to migrate Cognos Data Manager jobs to Datastage using pre-generated templates, but would now like to be able to dynamically generate Datastage jobs, given some specifications.

Thanks
ETL, DW, BI Consultant
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

the amount of work involved would be overwhelming compared to just making generic parameterized jobs that can be re-used in a variety of ways. Same goes for the template approach really -- you can usually just make the job generic instead of having a billionty copies of a job with minor differences across them.

I don't think this is a great path forward. That is just one opinion, though :)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Automated/semi-automated generation of Datastage jobs

Post by chulett »

anamika wrote:also found a lot of information on Dsexchange
Ummm... suggest you do NOT go there while at work. :wink:

On a related note, a colleague and I many moons ago did exactly this. But it was simpler time - we were generating Server jobs (as dsx files to be imported) from table metadata and the jobs were brain-dead "move everything from A to B" type jobs. Needed these because Server has no RCP concept and we had 400 - 500 jobs we needed of this ilk to handle some CDC tables. Was a fun little project but I'm not sure I'd want to try it again with anything more complex.
-craig

"You can never have too many knives" -- Logan Nine Fingers
anamika
Participant
Posts: 16
Joined: Sat Feb 27, 2016 9:43 am
Location: Ottawa

Post by anamika »

Thanks for the quick response and sharing your thoughts - I have noted your advice. I do have some additional questions for responders.
What were the steps and tasks involved in generating job(s) without a template ? did you have specific DSX/XML code blocks for each stage type, the connecting links, the osh code generation and such.
How exactly are you projecting/estimating the work involved, based on what metrics - would you mind elaborating ?
Iam mostly good with the template approach - it has been working well and I have a need to go beyond.

Thanks again for your input.
ETL, DW, BI Consultant
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

I have not actually done it. I have however spent a good bit of time parsing the extract files for various reasons and from that experience I would not personally care to try to generate a working job of any degree of complexity from it. Anything I could do with a simple approach I can do in a generic job with RCP. Is there something you want to do that you can't do this way? I am terribly curious... because this feels about like the datastage version of writing assembly language to add 2 numbers together.

Templates are easy, its really a fancy search and replace function.


Actually I just would not do it. I would rather just write the annoying database connection gibberish in C and then pull the data in and process it in C and spit it back out than try to write C that generated a DSX file to do the same thing. The full C code would run faster and be easier to understand. The crafted DSX would be harder to write, slower, and have little in its favor. Even if I had to multithread the C, I would prefer it. Now, what would be cool is a program that could read and write a ds compatible dataset. That would let you just run a unix command to invoke your C program in the middle of your job seamlessly, if you had a batch instead of record level need.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

There have been use cases in the past, such as scenarios where people wanted to write their own custom GUI for DataStage. Way....back.....in....release.....4.....we even had an API for gui extensions........

I tend to agree wholeheartedly with UCDI. Why bother? Once you get into some more complex scenarios with DataStage, the number of combinations you would have to consider in order to have a "compile-able" Job become enormous. There are certain patterns that exist among Stage types and typical patterns of DataStage Job deployment, but there are also exceptions for various technologies that don't fit the regular paradigm. RCP can do a lot of things for generic purposes... Job generation is a difficult thing --- (see "FastTrack").

...but you may have a valid use case. Tell us more.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply