ETL Modeling suggestions

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

ETL Modeling suggestions

Post by jackson.eyton »

Hi, I just wanted to get some pointers or opinions on what you guys use for relaying ETL instructions from the data architect. Currently we have these setup in a spreadsheet and its not the easiest thing to navigate and/or understand. I'm investigating FastTrack but I'm not positive on it. Our modeler works from source to target, where on the ETL side we work from target backwards. I'd love to hear any suggestions.
-Me
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I once worked in a shop with an ETL Architect who relied on a combination of Word and Excel documents. I have not worked an any place that has embraced FastTrack. Personally, I prefer logical source to target (reading from left to right) and always felt that documenting the other way was backwards, especially because it doesn't match how you actually implement the DataStage jobs. Where I am now we just rely on the whiteboard to sketch out ETL job designs. It's much more productive! If someone were to need a formal document, we can simply go into the job(s) in Designer and create a job report. Boom! Document generated.
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Back when we had an ETL Architect and a ton of new development going on, we also relied on "Source to Target" mapping spreadsheets to officially document the process. Most of them weren't really needed by the developers as we had templates that handled most of the scenarios we had. I mention them because they were the "backwards" kind and actually mapped Target to Source in spite of their names... which worked for us. :)
-craig

"You can never have too many knives" -- Logan Nine Fingers
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

Yea, we are doing in a combination of both methods. Our architect models a particular table from our old database into the new model. That table from the old world will populate several different fields in many different destination tables. So from his perspective in the sheet he adds a source, and writes his version of the ETL definitions for each source column in that source table. On the ETL side, once the modeling is done for a given change/addition. We then limit the sheet by the destination entity and add in the appropriate sources and unions to populate that destination. I have two issues with the way we do it now, one is that it being a spreadsheet, I cannot alter or amend the ETL rules that the modeler has done with what we actually had to do without checking out the file, which is often in use. This leads to the changes being forgotten and over time we have ETL rules that only semi apply. So when I, the newer guy, attempt to work off the sheet I am often confused by discrepancies between the ETL job and the modeling worksheet. Secondly is that it is REALLY difficult to get a big picture view of source to target. We're playing with creating a list view in sharepoint to at least allow more dynamic multiparty alterations, but I feel like there must be a better way to do this.
-Me
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

FastTrack is OK, though limited to "standard" operations.

For simple jobs, the job design can be generated directly from FastTrack. For more complex jobs the same is true, but the generated job will need to be tweaked by an ETL developer.

We also make use of the fact that, after a business analyst profiles the data and adds notes (in Information Analyzer), the analysis results are published into XMETA so that, when the ETL developer imports the shared table definition into the DataStage project, those notes are available.
Last edited by ray.wurlod on Mon May 21, 2018 2:41 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The Job generation piece of FastTrack can be challenging to reconcile and isn't used much, but one nice part about FastTrack is that it can be used to see the lineage, before you have coded anything in DataStage (or any other tooling or language, for that matter). I've seen sites use FastTrack for doing the conceptual mapping that ultimately would yield java code and others... It's aging, but still works well for less technical users who want drag/drop of the metadata names for relational tables that are already in the repository.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

Hmm I went through one of IBMs training/demo documents for FastTrack and tried to replicate one of our jobs that we already have. I could not get the lookup entered correctly as we use a joined source lookup, meaning our lookup query is a join of multiple classification tables and the join in FastTrak did not appear to support multi source.
-Me
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

We use FastTrack for mapping columns and the rules that define the mapping. Nothing else, but its been great for just that one task of keeping the mapping public, up to date, centralized, etc. The tool is terribly slow, but that is becoming the norm, opening Excel in Office 365 takes 10-20 times longer to open a file than previous versions (and its on a faster pc!) too.

But for us modeling and architect instructions are different from the mapping instructions. When the database models change, we have an email communication and a public HTML map of the model that we can study.

Ill be honest, and you will laugh... I export our FastTrack mappings to Excel and from that to a CSV file which I then open in an actual high performance, awesome tool... called... Notepad++
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

:D Notepad ++ ...great tool --- and sometimes, it is even too slow when compared to good 'ol regular Notepad.
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

HAHAHA, ok I only laughed because I use n++ all the time, just not for mapping... yet... UCDI, you mentioned HTML maps of the model, is that something that your architect puts out via Data Architect?
-Me
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

I shamefully do not know what tool generates our HTML models. I can find out. Its *probably* an IBM tool, since we are heavily invested there, but it could be something else entire.

Unrelated..
Regular notepad chugs up on large files. The Gun version of this does not, so I use that one sometimes. N++ default features and performance are well balanced, but some plugins slow it way down.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

More unrelated... we use UltraEdit here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply