Envisioning an ideal ETL environment

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
anamika
Participant
Posts: 16
Joined: Sat Feb 27, 2016 9:43 am
Location: Ottawa

Envisioning an ideal ETL environment

Post by anamika »

Hi All,
I have been tasked with evolving best practices, standards, design patterns for an "ideal/desirable" data integration environment.
I have experience with Datastage, SSIS, Data Manager and other custom ETL tools in a variety of data warehousing environments.
Until now I have come up with various requirements, best practices, "must do" kind of items from past experience. However, I would like to hear from this group as well , with the intent to be holistic/eclectic when undertaking such an endeavor.
I believe in leveraging the "wisdom of the crowd" and learning from the same.
If you can contribute in any manner, that would be much appreciated - feel free to air your thoughts, comments, flames, digs etc.,

Thanks
ETL, DW, BI Consultant
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My best best advice is that you have to realise that the ETL component is a small part of a much larger whole, involving understanding the information; governing both the information and the processes; managing metadata; measuring, remediating, and monitoring data quality, and more.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I'm a look at the abstract first sort of guy, but I quickly follow it with take a very close look at that is happening right now in the trenches.

Best practices is a nice discussion to have, but unless you can count on your shop to actually follow them -- to be self-disciplined and self-enforcing -- you can expect the nice discussions to go pretty much nowhere on the ground.

So, the first "must do" on my list is get a candid response from the developers: do you actually follow best practices, or do you just do what you believe is necessary to get the jobs done?

My personal peeve is business value at start-up speed. It's nice if you're actually a start-up, but for an established shop the first questions must be "can I guarantee the stability of my production environment?" and "why should speed be more important than quality?"

I don't mind saying that the latter question has never been rationally (or even sanely) answered for a batch processing environment.

So, if I'm going to rant, I should offer something substantial to your query.

First best practice on my list: what are my points of failure, and what must I do to make recovery as efficient as possible? This is a design point that seems to get discussed at the end of the design phase, where it should be a checkpoint in every phase.

Second best practice on my list: what is the definition of speed to market? DS is already a rapid application development environment. If DS objects are not getting into active operation "quickly" enough, where are the bottlenecks? Chances are they are with the people who just don't understand DS.

Third (and final for this post): Do your DS developers talk to each other? The main point there is that while there can be two or more possible approaches to any given design question or challenge, do your developers know which ones are going to be the best approaches for your particular needs, infrastructure and limitations? Reusability is part of that discussion as well, along with standards around naming conventions for stages, jobs, scripts and files.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply