Datastage jobs folder organization

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply

Organize jobs and group them by Source or group them by Targets

Group by Source
1
25%
Group by Target
3
75%
 
Total votes: 4

jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Datastage jobs folder organization

Post by jreddy »

This is more of an architecture question and am seeking different perspectives to see what the more popular choice is in various implementations.
When a project has over 1500+ jobs and has data coming in from 5-10 sources all being transformed (together) to be loaded into an Enterprise Data warehouse, how would you go about organizing those jobs in an intuitive way.

We could potentially name the jobs so you would know which source its coming from and where it is being updated to, but if they are grouped logically it is easier to get to them and moreover this naming standard wont work when multiple sources or multiple targets are processed in that job

With both the initial development perspective and future maintenance perspectives in mind.. design of Job folders is only to help the developer develop quickly and debug quickly in case of production issues/new requirements/impact assessments etc.. - being able to find the job faster is probably a huge criteria... - with this in mind, would you normally group and organize jobs by Sources OR by Targets (which are the data warehouse tables) ?

Thanks in advance !!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

We prefer to group by neither source nor target but, instead, by major subject area and then by job type (update, insert, recovery, sequence, etc.) For example, the Allowances subject area will have sub-folders Update Jobs, Insert Jobs, Recovery Jobs, Sequence Jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

A project is usually one target. Jobs within that project are grouped by source. Usually one source is within one sequence. So the jobs are grouped by sequence. A master sequence may pull from a buncjh of sources but this is the way I do it most of the time.
Mamu Kim
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I like to organize the folders at a high level by execution order (E, T, then L), and at the next level of Extract have folders by source system, and at the next level of Transform, by subject area, if it were a data warehouse, although most of our projects are not data warehouse related.

When the folder names do not cooperate with the sorting we want, then we prefix them with "01 " "02 " and so on.
Choose a job you love, and you will never have to work a day in your life. - Confucius
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

In financials, I find the best logical grouping to be by account. However, I also needed to adjust that as the project got larger, so my suggestion is to find the best logical starting point. This gave me the most flexibility to create sub-groups.

Low-level design choices -- common attributes like source or destination -- prepared me for finding different logical groupings which, with good documentation, made finding things easier during production incident support situations.

In the end, though, with our batch system being very large, grouping by job run timing helped the most.

In my application environment, naming convention was the most critical design decision. It's the best way to find "groups" of jobs which are most likely to be affected by system issues. For example, each timeframe group has an "on-off" switch, a dummy job which when put on hold makes sure that an earlier incident didn't also cause later jobs to fail.

I came from a mainframe development area, so I found it best to keep to that pattern. Every job entry point -- we schedule using Control-M -- is a job sequence with the same name as the CM job. The CM description points to the folder path in Director. Job sequences and the parallel jobs they invoke have the same folder paths.

Nothing beats clear and concise documentation. Our support people don't need to know more than how to use Director to investigate and triage failed jobs.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply