XML Files in Information Analyzer

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

XML Files in Information Analyzer

Post by Nagac »

Hi

We have XML files as source, which we want to do profiling in IA. Is that possible?, If possible we will end up doing staging and then(it is little complex as XML files are massive)

I know we can do RDBMS, Flat Files but couldn't find the way to handle XML files. Can someone advise.

Thanks
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Need to parse them first, into the various chunks of text that you want to examine...... usually this means sending the columns to the xmlInput or alternatively, the Hierarchical Stage, and parsing them there into their appropriate rows and columns, based on the hierarchical structure of your particular XML document, and the various tags (xml elements and attributes) that it contains. One "column" in one "row" of xml in an rdbms could potentially contain thousands of rows and columns of actual "data" to be examined.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Post by Nagac »

Thanks Ernie,

Do you mean, we need to load these chunks into table then do the profiling?
One "column" in one "row" of xml in an rdbms could potentially contain thousands of rows and columns of actual "data" to be examined.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Yes. And for several key reasons. The most important being that you need to be looking at "rows" of data for any fairly standard profiling tool. This means parsinflg you xml structure into its potentially many different table relationships. Some xml documents are simple single repeating nodes, but not many. Each parent child grandchild (etc) path is a potential set of rows for analysis. Parse them out and then analuze the resulting tables or sequential files.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply