XML to Dataset creates Huge Dataset File and more I/O's

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vishwa4ds
Participant
Posts: 7
Joined: Thu Oct 06, 2005 6:32 am

XML to Dataset creates Huge Dataset File and more I/O's

Post by vishwa4ds »

Hello All,

I am facing some challenges in Datastage Server Usage I/O's.

The requirement is to read XML's and load into Dataset and in the next job load data from dataset to Oracle Tables and we are getting expected results.

We were informed by Datastage Admins that there is more number of I/O's happening on the Datastage Server while writing data into dataset and also while reading from dataset.

Also Admins have informed us that when we read XMLs and write into dataset , the size of dataset is exponentially growing.

The approach how we are reading XML is :

Read one XML approximately 1 MB in size kept at unix directory and read XML's using XML stages and write it to Dataset. While reading XML we read XML contents into one column in XML input Stage and write into Dataset.

Environment is :

OS : Linux
Datastage : 8.0.1 on 8 node system.

Please share your experience in reducing the I/O's.

Thanks in Advance.

Vishwanath
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This really is a "how long is a piece of string?" question. How complex is the XML. Do they really mean "exponentially" and have they figures to prove it?

For some XML there certainly is the potential to generate a large amount of storage requirement, particularly if types are specified as unbounded string.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Yes. More details are needed....it could be something strange, but it might also be absolutely correct and legitimate, depending on the contents, the number of repeating nodes and subnodes, and the ultimate size of the receiving columns.

Some things to look for and let us know:

The number of "rows" that go into the xmlInput Stage and the number of "rows" that come out.

The datatypes and lengths of the columns in your dataset...and how many columns.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

What is the datatype definition of the Oracle COlumn that holds the XML data ? Are you writing the entire field directly to a seq file /datastet stage ? What is the column length you defined ? If an oracle column is defined as SYS.XMLTYPE it can reach upto 4 GB for each row. Can you parse the XML ? what Oracle function are you using to read teh XML out of Oracle - is it something like

Code: Select all

EXTRACT([ColumnName],'/').getClobVAL() 
or

Code: Select all

XMLTYPE.getClobVal([ColumnName])  
Post Reply