Requesting advice on Data Lineage options

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
johnboy3
Premium Member
Premium Member
Posts: 52
Joined: Fri Jun 19, 2015 2:48 pm
Location: Jackson, MS, USA

Requesting advice on Data Lineage options

Post by johnboy3 »

OK guys, I'm confused but I know some of you can help me.

Management is considering software upgrades that may involve re-writing DataStage jobs somehow. I recently provided them with Oracle's delivered data lineage spreadsheets, and began discussing the possibility of generating data lineage reports from DataStage. I have been asked to provide additional research.

Now, I find that there appear to be 3 places from which we may be able to obtain data lineage information! I already knew DataStage could (under the right circumstances) provide data lineage information, but now it seems that Infosphere Information Server can as well, and the IIS information can produce information including OBIEE reporting?! Additionally it seems that Oracle has likewise got abilities in producing data lineage information!

Can someone make this easy for me? What is the best path for me to help out with some job-rewrite-useful-data-lineage information for management to consider? We have everything in a single DataStage project and data collection is not currently turned on for the project (or any jobs that I can tell).

Please help!
john3
john3
----------------------------------------------------
InfoSphere 8.5.0.2; DataStage 8.5.0.0; OS-RHEL 6.6; DB-Oracle Enterprise Edition 11g (11.2.0.4)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you tried an exact search for "data lineage" yet? You'll find quite a number of posts in the Metadata Workbench forum, including some I saw about Best Practices, see if anything there helps.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

A real key here is the release that you are running, and whether or not you are licensed for what is called the Information Governance Catalog (IGC).

With IGC, lineage is largely "automatic". There are some dependencies, of course, but if you follow good practices around the use of parameter sets (shared Job Parms among developers with accepted and well discussed default values), and use Sequential stages, DataSets and most of the relational stage types (odbc, jdbc, oracle, db2, teradata, netezza, etc.) for a large majority of your Jobs, the lineage will just "work" out of the box.

Where you have special packs, custom stages, etc., there might need to be some "manual" lineage put in place, but it is very do-able.

Collecting "Operational Metadata" (OMD, the "runtime" info) is nice and provides some additional useful information, but it is not "mandatory".

...and if you are extensively using RCP, you will still get lineage, but it will work best when OMD is collected....something you can and should do optionally after you have seen how lineage works.

IGC can also help you illustrate ALL of the lineage that you find critical to your environment, whether it is connecting to OBIEE reports, or just cataloguing what you have in Excel, or in some other ETL tool. The degree of automation depends on the kind of metadata you can obtain.

This is a lengthy subject, but this is a good start.........let us know if you have any other questions.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hey Ernie, was IGC available back in the 8.x release? That's what the OP is running, unless the post is not marked correctly... if not do you recall when it was introduced?
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Data Lineage exists all the way back to 8.0, as part of the Metadata Workbench....but it was dramatically enhanced with 11.3 and the introduction of the Information Governance Catalog, which is an "evolution" of the original Metadata Workbench and Glossary compentry.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply