Complex format files Profiling - addendum

dscnn · Post by **dscnn** » Wed Feb 01, 2006 9:58 am

I have a complex file that needs to be profiled. There is a post on this earlier and it was suggested to break up similar rows into seperate bunches and then profile them.
But what is the use of buying the tool if it has to be done manually?

Is there any other better way to profile complex format data? All help would be greatly appreciated.

rhys.jones@target.com · Wed Feb 01, 2006 6:27 pm

dscnn wrote:I have a complex file that needs to be profiled. There is a post on this earlier and it was suggested to break up similar rows into seperate bunches and then profile them.
But what is the use of buying the tool if it has to be done manually?

Is there any other better way to profile complex format data? All help would be greatly appreciated.

I don't think that it's possible in its current version, although I haven't tried yet. I recently had a conversation with one of their Support Engineers, and they won't support any middleware other than the DataDirect ODBC drivers. Which give you limited text file interaction (.csv is probably about it). You'd have to "flatten" your data. Try this with a CFF stage in Datastage writing to a sequential file, then kick off your ProfileStage package (see my response to your other post).

Information Analyzer and Hawk may improve upon this as the hype is the products will share common "connectors" to interact with source data. So this may indicate being able to extract in parallel, deal with CFF's, XML, etc.

roy · Post by **roy** » Thu Feb 02, 2006 3:57 pm

Hi,
In the Hawk release the CFF stage should be available as the source and thus should provide this functionality, at least that what I was told by the lecturere of the course.
Personaly I fail to see the value of a profiled CFF if you dont seperate the profiling 1 foreach scheme you have, since potentially you might get a column identified as being 80% null when you only have 20% rows of that format and all have it with a value where in the other formats it is simply not present.

IHTH,

rhys.jones@target.com · Thu Feb 02, 2006 5:35 pm

I see what you're saying now, Roy. You really do need to profile each sub-schema of the flat file independently. Which would mean spinning those related records into separate sequential files. That should still be possible in DataStage (CFF Stage to Transformer to multiple Sequential Files).

dscnn,
In ProfileStage you'd have separate logical databases setup to profile each file. If calling from DataStage, then, you'd have multiple ProfileStage packages you'd have to call from command line. Or create a new shell script that calls all your other runpackage.sh scripts. The end result you'd have to piece together manually to get a holistic profile of an entire complex file. But it's certainly possible.

I wonder - could you profile the distribution values table in your ProfileStage repository to get a picture of your entire complex file??

DSXchange

Complex format files Profiling - addendum

Complex format files Profiling - addendum

Re: Complex format files Profiling - addendum