DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
BIuser



Group memberships:
Premium Members

Joined: 02 Feb 2006
Posts: 238
Location: South Africa
Points: 2223

Post Posted: Fri Feb 17, 2006 7:00 am Reply with quote    Back to top    

DataStage® Release: 7x
Job Type: Server
OS: Unix
Additional info: ELT, ETL; customers are confused
Hi Guys

In this discussion I dont think the Release, Platform and OS makes a difference. This is more of a methodology issue.

Sunopsis are going to market (you'll see Sunopsis as visionaries in the Gartner ETL quadrant) with an ELT message. In other words, (E)xtract from the sources and (L)oad to the target or staging database and then (T)ransform the data. The great part of the tool is that it will compile all the native code for the transformation to occur on the target database.

Then there's the news in the industry that ETL as a buzzword and methodolgoy is dying. Mainly because the RDBMS systems we extract and load to have improved their own transform abilities - i.e. pivoting data, grouping by cube data etc..

My questions for comment to the forum are;


* Is ETL dying?
* What are the pitfalls for an ELT solution?
* Can DataStage do ELT?
* ETL = row based processing, ELT = set/batch based processing?


I have given it some thought, and here is what I could come up with;

Is ETL dying?

No. My response is based mostly on the limitations/pitfalls I feel are in the ELT space.

What are the pitfalls for an ELT solution?

Firstly, how busy is the source/target system that you need to do the transforms on? If the system is overloaded as it is, it will require a purchase of new hardware.

Secondly, debugging of transforms on data is going to be an issue without extra data stored in a test environment. I guess they (ELT) know and trust each client has a test environment which they will piggyback on. In per row processing (ETL) the data is on the engine and can be reprocessed as often as the developer wants.

Thirdly, can the current RDBMS system handle the complexity of the transforms developed? If not, it may be necessary to upgrade or migrate to better technology.


Can DataStage do ELT?

Keeping in mind that ELT is just a methodology, I would say 'yes, but to an extent'. The only difficulty is manually coding the SQL to do the transformations which are executed by DataStage. In this case, DataStage turns into a glorified scheduler.

Is ETL = row based processing and ELT = set/batch based processing?

This is how I see it anyway.

Any comments welcome. Ideally I'd like to get to a point where we agree on the direction of ETL and its influence on the leaders in the ETL space.

Thanks
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54071
Location: Sydney, Australia
Points: 293279

Post Posted: Fri Feb 17, 2006 5:17 pm Reply with quote    Back to top    

Egg, Lettuce and Tomato? Laughing

Of course DataStage can do ELT. You'd implement EL (either in DataStage or natively) then ETL with the one database being both source and target.

Where ETL beats ELT hands down is where you have disparate data sources, particularly with the "frictionless connectors" that are another feature of the Hawk release (you only need to set up a connection once, you can then store all that information as a reusable component in your Repository). And, of course, a dedicated ETL tool is much more easily customized (complex transforms) than an in-database tool - at least for today's technologies.

I disagree with your differentiation between row and set processing. There's no real difference - a set still has to be processed row by row in the load phase - and DataStage can certainly create data sets for loading.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Canberra, Sydney and Melbourne
Rate this response:  
Not yet rated
kcbland

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 15 Jan 2003
Posts: 5209
Location: Lutz, FL
Points: 39192

Post Posted: Fri Feb 17, 2006 11:32 pm Reply with quote    Back to top    

My opinion is that ELT is just a sales gimmick. Saying ETL is dead is like Bill Gates announcing the end of Unix back in 1986. Yeah, got that one right Billy. One major pitfall with ELT proces ...

_________________
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rate this response:  
Not yet rated
vmcburney

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup

Joined: 23 Jan 2003
Posts: 3564
Location: Australia, Melbourne
Points: 27712

Post Posted: Sun Feb 19, 2006 12:11 am Reply with quote    Back to top    

It is an interesting topic. I think both products will have a strong future. Informatica have bet on both horses with ELT ability built into the latest PowerCenter. IBM-Ascential have opted to bypass ELT and go with feature rich ETL, Hawk release sees better data quality plugins. If you tried to do heavy duty matching and standardisation via ELT you would put a big load on each database. Ascential-IBM have also got Data Integrator up their sleeve which gives you access to data without moving it via an ETL server.

Sunopsis works very well on a Teradata database where it can make use of the very rich set of transformation features, however you need to upsize your database to handle ELT functions so on high volume projects you may find the upgrades and expansions of your databases is just as expensive as a dedicated ETL server.

My other doubt about ELT is that you need expertise in the databases you are placing all this load on to make sure the transformation functions are optimised, partitioned and indexed correctly. You can get away with doing ETL work without being a database expert.

_________________
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn: Vincent McBurney LinkedIn
Rate this response:  
Not yet rated
kduke

Premium Poster


since February 2006

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 29 May 2003
Posts: 5227
Location: Dallas, TX
Points: 35013

Post Posted: Sun Feb 19, 2006 9:17 am Reply with quote    Back to top    

IBM could make ELT work if they combined all their database technologies. If they used Redbrick to stage and then made it seemless to integrate with DB2 or Informix then these would need stronger tran ...

_________________
Mamu Kim
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours