hi all
i need to create two files (file set) for two lookups based on dbms data.
i can create two jobs for filling each file or one job for filling two files.
what is the best practice ?
one job per unit of job ? or may be one job per task
thanks
division into jobs
Moderators: chulett, rschirm, roy
My personal preference generally is to use separate jobs for the sake of restartability. When something goes wrong and some jobs complete but another job aborts, then it can be easier and faster to troubleshoot the problem on a simpler job, and only the aborted job needs to be restarted. There's no extra processing or repeat processing of logic that already completed successfully.
Choose a job you love, and you will never have to work a day in your life. - Confucius
+1
We always try to build jobs as atomic, restartable units of work for precisely the reasons mentioned above. And we wrap them in a "framework" that knows how to back out any partially completed loads (where applicable) so that restarts can be as "hands off" as possible.
We always try to build jobs as atomic, restartable units of work for precisely the reasons mentioned above. And we wrap them in a "framework" that knows how to back out any partially completed loads (where applicable) so that restarts can be as "hands off" as possible.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Well... a girl's got to keep some secrets. But high level:
It's basically a set of tables that record what jobs have run and assign a unique id to each run of each job. All records inserted or updated by the run are tagged with that number. We also have control tables that document what tables each job targets and what 'rollback mechanism' to use for each. A stored procedure is called when a failed job has been restarted that looks up the mechanism and id for the run that failed and resets the table back to pre-run conditions. For example, type 2 updates have their new record deleted and the previous entry set back to 'current'.
All jobs have these pipelines incorporated into them, something we call their job control framework, set to run in the proper order:
1. Check for and perform rollback if needed
2. Initialize a new run in the control tables
3. <actual work goes here>
4. Finalize the run in the control tables
Note that we're currently using Informatica for this, which has a "Target Load Plan" setting where you specify the order the pipelines run in, one after the other, for any given mapping. Been long enough that I'm not quite sure how you would accomplish something equivalent in DataStage. Be curious if others are doing something similar.
It's basically a set of tables that record what jobs have run and assign a unique id to each run of each job. All records inserted or updated by the run are tagged with that number. We also have control tables that document what tables each job targets and what 'rollback mechanism' to use for each. A stored procedure is called when a failed job has been restarted that looks up the mechanism and id for the run that failed and resets the table back to pre-run conditions. For example, type 2 updates have their new record deleted and the previous entry set back to 'current'.
All jobs have these pipelines incorporated into them, something we call their job control framework, set to run in the proper order:
1. Check for and perform rollback if needed
2. Initialize a new run in the control tables
3. <actual work goes here>
4. Finalize the run in the control tables
Note that we're currently using Informatica for this, which has a "Target Load Plan" setting where you specify the order the pipelines run in, one after the other, for any given mapping. Been long enough that I'm not quite sure how you would accomplish something equivalent in DataStage. Be curious if others are doing something similar.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers