Level of ELT DataStage can do..

Moderators: chulett, rschirm

Post Reply
BIuser
Premium Member
Premium Member
Posts: 238
Joined: Thu Feb 02, 2006 4:03 am
Location: South Africa

Level of ELT DataStage can do..

Post by BIuser »

Hi Guys

In this discussion I dont think the Release, Platform and OS makes a difference. This is more of a methodology issue.

Sunopsis are going to market (you'll see Sunopsis as visionaries in the Gartner ETL quadrant) with an ELT message. In other words, (E)xtract from the sources and (L)oad to the target or staging database and then (T)ransform the data. The great part of the tool is that it will compile all the native code for the transformation to occur on the target database.

Then there's the news in the industry that ETL as a buzzword and methodolgoy is dying. Mainly because the RDBMS systems we extract and load to have improved their own transform abilities - i.e. pivoting data, grouping by cube data etc..

My questions for comment to the forum are;


* Is ETL dying?
* What are the pitfalls for an ELT solution?
* Can DataStage do ELT?
* ETL = row based processing, ELT = set/batch based processing?


I have given it some thought, and here is what I could come up with;

Is ETL dying?

No. My response is based mostly on the limitations/pitfalls I feel are in the ELT space.

What are the pitfalls for an ELT solution?

Firstly, how busy is the source/target system that you need to do the transforms on? If the system is overloaded as it is, it will require a purchase of new hardware.

Secondly, debugging of transforms on data is going to be an issue without extra data stored in a test environment. I guess they (ELT) know and trust each client has a test environment which they will piggyback on. In per row processing (ETL) the data is on the engine and can be reprocessed as often as the developer wants.

Thirdly, can the current RDBMS system handle the complexity of the transforms developed? If not, it may be necessary to upgrade or migrate to better technology.


Can DataStage do ELT?

Keeping in mind that ELT is just a methodology, I would say 'yes, but to an extent'. The only difficulty is manually coding the SQL to do the transformations which are executed by DataStage. In this case, DataStage turns into a glorified scheduler.

Is ETL = row based processing and ELT = set/batch based processing?

This is how I see it anyway.

Any comments welcome. Ideally I'd like to get to a point where we agree on the direction of ETL and its influence on the leaders in the ETL space.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Egg, Lettuce and Tomato? :lol:

Of course DataStage can do ELT. You'd implement EL (either in DataStage or natively) then ETL with the one database being both source and target.

Where ETL beats ELT hands down is where you have disparate data sources, particularly with the "frictionless connectors" that are another feature of the Hawk release (you only need to set up a connection once, you can then store all that information as a reusable component in your Repository). And, of course, a dedicated ETL tool is much more easily customized (complex transforms) than an in-database tool - at least for today's technologies.

I disagree with your differentiation between row and set processing. There's no real difference - a set still has to be processed row by row in the load phase - and DataStage can certainly create data sets for loading.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

My opinion is that ELT is just a sales gimmick. Saying ETL is dead is like Bill Gates announcing the end of Unix back in 1986. Yeah, got that one right Billy.

One major pitfall with ELT processing is that all hard-core activities occur within the database. Sunopsis sells this as a major advantage, because database parallel processing capabilities simplify task management.

But, you would know this is a lame argument if you've ever tried to write a single SQL statement as an ETL load process. The query is unmanageable, the number of nested inline inviews destroy any chance of optimized queries utilizing indexes because of the massive complexities, and you don't leverage oft-reused reference sources in any kind of cacheable mechanisms. Forget about restartability, controlling your processing, metrics or metadata about outer join reference lookup ratios, etc.

PX and Server have different methods for optimization of lookups, process segmentation, and parallelizations. You have complete discretion when to stage to disk for audit trail or reusability. Blasting everything into Oracle and then running PL/SQL is usually not the fasted ETL solution. You need to "baby-step" a lot of processing so that you avoid massive queries and can step data thru work tables.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

It is an interesting topic. I think both products will have a strong future. Informatica have bet on both horses with ELT ability built into the latest PowerCenter. IBM-Ascential have opted to bypass ELT and go with feature rich ETL, Hawk release sees better data quality plugins. If you tried to do heavy duty matching and standardisation via ELT you would put a big load on each database. Ascential-IBM have also got Data Integrator up their sleeve which gives you access to data without moving it via an ETL server.

Sunopsis works very well on a Teradata database where it can make use of the very rich set of transformation features, however you need to upsize your database to handle ELT functions so on high volume projects you may find the upgrades and expansions of your databases is just as expensive as a dedicated ETL server.

My other doubt about ELT is that you need expertise in the databases you are placing all this load on to make sure the transformation functions are optimised, partitioned and indexed correctly. You can get away with doing ETL work without being a database expert.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

IBM could make ELT work if they combined all their database technologies. If they used Redbrick to stage and then made it seemless to integrate with DB2 or Informix then these would need stronger transformation or use DataStage to transform. Redbrick load times are amazing. I think Oracle has the best transformation but I don't know these other databases as well. We found that the more complex the transformation then the better DataStage server performed compared to PLSQL which was surprising. If you look DTS and SQL Server then I think the job scheduling and consistent perfomance. It cannot run 500 jobs from start to finish and recover gracefully from errors. SQL Server jobs are so unstable.

I think the big plus is independence from any database with ETL and DataStage or Informatica. If you want to switch from SQL Server to MySql or Oracle then these are powerful tools. These tools are more "open". How commited to "open" are these companies but then again you are tied to one of these products and therefore dependent and not independent. These tools are very expensive. Which of these solutions makes your company more vunerable?
Mamu Kim
Post Reply