ETL is more than tools or an attempt at organized chaos.

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

ETL is more than tools or an attempt at organized chaos.

Post by Ultramundane »

ETL is more than tools or an attempt at organized chaos. Good ETL is designed to remove chaos completely for organized structure. And, while good ETL does require good tools like Datastage, but more than tools, it is a companies dedication to a data governance policy and to an enterprise data management practice. Else, it is nothing more than vapor.

Imagine if you will, that someone wants to get some data for something, just so that they can move onto the next task of that something as fast as possible. Maybe creating a web page or some other type of application. Now imagine if you will, that that something which they need is data. And, that that somone is actually many folks in many different silos.

Now, imagine that you have silos A-G and those silos all want access to ETL. Imagine that you need to move something from silo A->C.

In such a scenario, you have three very undesirable options using ETL.
Option 1: Give silo A access to parts of silo C.
Option 2: Give silo C access to parts of silo A.
Option 3: Produce flat files or other intermediate data sets from A which are fed to someone in C to process (very inefficient).

Now, none of the option above are good. Option 1 propagates security to many systems to people who shouldn't have had access to begin with (IMHO). Same problem with Option 2. Option 3, well that is just inefficient and it is something that good ETL is designed to do away with.

So, what are we left with? Silos of data trying to be managed by people who don't care about ETL and who are not ETL experts or developers, trying to perform as such, is both a security challenge and a major inefficiency. Not to mention, imagine that you had to use ETL just one time in your area. Now, you have to support a couple of jobs which you don't have the capability or knowledge to support.

Thus, ETL should be a centralized staff of people who are dedicated to ETL standards, methodologies, metadata management etc... This group of people does the administration, architecting, and developing of the ETL jobs. They are experts. Thus, ETL is not designed to destroy silos, but to allow the people who work with such datasets to not have to focus on the data movement aspects anymore, but on what they best. Maybe it is writing a .NET program or C program. But surely, not ETL.

Thoughts? Is it really this simple, but at that the same time difficult for people to understand? Or, is it just me that believes in such fantasies?
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Re: ETL is more than tools or an attempt at organized chaos.

Post by Ultramundane »

So, I believe that in order to do effective ETL/DGP/EDM that the following must be accomplished. For most, I don't think this can be accomplished by status quo or by simply buying a so called silver bullet product. While software is important, these things must be accomplished by a dedicated group of individuals who are masters of the trade. Those individuals use software to meet the objectives.

In order for an ETL/DGP/EDM strategy to be successful, it must effectively be able to define and dictate the governance of the ten policies which are given below.

My 10 Commandments for ETL:
1. We must maintain the Right Level of Access so as to give end users the necessary level of access and to also protect the business assets from vulnerabilities.

2. It is imperative that we be able to easily Audit, Identify, and Regulate the usage of business critical and sensitive data.

3. We must be able to Identify the Data Value as it pertains to the success of the business and to regulatory conditions.

4. Any systems, databases, tables, and data flows which are demeaned as Highly Available for business operations must be identified and maintained accordingly.

5. The Service Level Agreements for Data must be known and monitored.

6. We must be able to perform Outage Impact Analysis both before and after any outage which occurs on a source, target, or intermediary systems necessary for data transport.

7. Critical Data must be or must be able to be Validated as being correct.

8. We must be able to perform Change Impact Analysis both before and after any changes occur on a source, target, or intermediary systems.

9. We must be able to perform Data Usage Analysis to know where our data is being used and we must be able to determine why it is being used.

10. We must create and maintain a set of models that demonstrate at a high level the data movement processes. The data movement specialists and data stewards will require the following documents in order to build the data movement processes. These documents will be maintained and cataloged by the data movement specialists and data stewards.

+ Conceptual Models
+ High Level Specifications
+ Data Flow Diagrams
+ Metadata Documents & Component Specifications
+ State Diagrams & Flow-Chart Diagrams
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Yes! Yes! and Yes....right up until the suggestion of C and .NET. ETL enhances the ability to achieve all of these great goals......other languages leave those goals at arms length.
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply