File Metadata within Data Stage

Do you have features you'd like to see in future releases of DataStage, MetaStage, Parameter Manager, Version Control or one of the other tools represented on this forum? Post your ideas here!

Moderators: chulett, rschirm

Post Reply
lagrangeusa
Participant
Posts: 11
Joined: Mon Jul 11, 2005 11:58 am
Location: Chicago, Illinois, USA
Contact:

File Metadata within Data Stage

Post by lagrangeusa »

When importing metadata either through Data Stage Designer or Data Stage Manager, you are required to select "Import", then select the type of file metadata that you want to import such as "Sequential, Plug-In" ODBC, etc".

A few suggestions that would greatly (IMHO) increase productivity:

1. Refresh the Repository automatically. Currently you have to go up to the menu at the top of the Data Stage application and select "View, Refresh".

2. When I either load metadata from within a File Stage, Data Base Stage or even a Processing Stage like Sort or Aggregate or use the "drag" technique over a link, include ALL the metadata information about that object.

For example, I have a source, a Transformer stage and then a target. I draw a link between the source and the Transformer stage and then the Transformer stage and the target. Now going to the Table Definitions folder in the Repository view, I drag the metadata over a link. Currently I get the column definition yes, but I still need to manually fill in the file name, if it is comma delimited, if the first row is column names, is it UNIX, is it DOS. All of these things are known at the time that the metadata is imported. Just pass this along. I would have to believe 99.9% of the time the information is NEVER going to change, yet it seems as if the staff who wrote the Data Stage objects wanted to provide flexibility so that if it was different, you would have to provide it.

If I was designing the interface and I had a Source, Transformer and Target. I would allow the ETL developer to add the objects to the ETL pallet, allow them to draw the links, then if the user dropped the metadata over each link, all the required information like file location, DB name, etc would be defaulted. Even Userid's and Passwords for DB tables could be defaulted (of course they could be over written by the developer). All the ETL developer would have to do is open the Transformer stage, and draw the necessary source to target relationships. Once that was done, the job would be ready.

The current interface gets more frustrating when doing Sorts and Aggregations. On a Sort (or Aggregation) I have Input and Output tabs. I provide the Input information via my input to the Sort (or Aggregation stage). I am then forced to SAVE it to the Table Definitions, then click on the Output tab and load it. Just do it automatically. What is the point to storing this information in the Table Defintions. It just clogs up the visible metadata. I almost never (I cannot think of one instance where I have) change this information.

I don't know if anyone else has thought of this or have been frustrated with the GUI interface. Please understand, I think the IBM/Ascential suite of tools is great. I am only trying to make data transformation easier, simpler and quicker.

If I am off base or someone has found a way to address these issues, drop me a note and enlighten me.

Thanks
Dean La Grange
La Grange Group
IBM/Ascential Business Partners
COGNOS Business Partners
Bill Inmon Certified Data Warehouse Architects

8770 W Bryn Mahr, Suite 1300
Chicago, Illinois 60631
773.867.8005
www.lggusa.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Most of what you want can be done within the object properties. For example in a Sequential File stage you can click Load on both the Format and the Columns tab.

The file name is often prefixed with a directory pathname job parameter. (Should always be, if you ask me.)

In terms of the file name, there are arguments for and against. You have put the arguments for. Against is the possibility that you may want to use a different file name (another file with the same metadata) or even to use non-file metadata with a Sequential file. (For instance, you may move data from a source table to a staging area; the sequential file will share the same metadata as the table, but probably not its name.) The reverse may also be true, for example when loading data from a file into a table; the file's metadata may be used on the Input link to the DBMS stage.

If they were to implement the functionality you suggest, I'd want to be able to set an option to drop the file name into the field or not. Quite often I don't want the same file name as the one from which I obtained the metadata.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply