Complex format files Profiling - addendum

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
dscnn
Participant
Posts: 26
Joined: Tue May 17, 2005 1:11 pm

Complex format files Profiling - addendum

Post by dscnn »

I have a complex file that needs to be profiled. There is a post on this earlier and it was suggested to break up similar rows into seperate bunches and then profile them.
But what is the use of buying the tool if it has to be done manually?

Is there any other better way to profile complex format data? All help would be greatly appreciated.
rhys.jones@target.com
Participant
Posts: 24
Joined: Mon Mar 14, 2005 6:42 pm
Location: Minneapolis, Minnesota

Re: Complex format files Profiling - addendum

Post by rhys.jones@target.com »

dscnn wrote:I have a complex file that needs to be profiled. There is a post on this earlier and it was suggested to break up similar rows into seperate bunches and then profile them.
But what is the use of buying the tool if it has to be done manually?

Is there any other better way to profile complex format data? All help would be greatly appreciated.
I don't think that it's possible in its current version, although I haven't tried yet. I recently had a conversation with one of their Support Engineers, and they won't support any middleware other than the DataDirect ODBC drivers. Which give you limited text file interaction (.csv is probably about it). You'd have to "flatten" your data. Try this with a CFF stage in Datastage writing to a sequential file, then kick off your ProfileStage package (see my response to your other post).

Information Analyzer and Hawk may improve upon this as the hype is the products will share common "connectors" to interact with source data. So this may indicate being able to extract in parallel, deal with CFF's, XML, etc.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
In the Hawk release the CFF stage should be available as the source and thus should provide this functionality, at least that what I was told by the lecturere of the course.
Personaly I fail to see the value of a profiled CFF if you dont seperate the profiling 1 foreach scheme you have, since potentially you might get a column identified as being 80% null when you only have 20% rows of that format and all have it with a value where in the other formats it is simply not present.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
rhys.jones@target.com
Participant
Posts: 24
Joined: Mon Mar 14, 2005 6:42 pm
Location: Minneapolis, Minnesota

Post by rhys.jones@target.com »

I see what you're saying now, Roy. You really do need to profile each sub-schema of the flat file independently. Which would mean spinning those related records into separate sequential files. That should still be possible in DataStage (CFF Stage to Transformer to multiple Sequential Files).

dscnn,
In ProfileStage you'd have separate logical databases setup to profile each file. If calling from DataStage, then, you'd have multiple ProfileStage packages you'd have to call from command line. Or create a new shell script that calls all your other runpackage.sh scripts. The end result you'd have to piece together manually to get a holistic profile of an entire complex file. But it's certainly possible.

I wonder - could you profile the distribution values table in your ProfileStage repository to get a picture of your entire complex file??
Post Reply