Page 1 of 1

XML to Dataset creates Huge Dataset File and more I/O's

Posted: Tue Jul 31, 2012 1:13 am
by vishwa4ds
Hello All,

I am facing some challenges in Datastage Server Usage I/O's.

The requirement is to read XML's and load into Dataset and in the next job load data from dataset to Oracle Tables and we are getting expected results.

We were informed by Datastage Admins that there is more number of I/O's happening on the Datastage Server while writing data into dataset and also while reading from dataset.

Also Admins have informed us that when we read XMLs and write into dataset , the size of dataset is exponentially growing.

The approach how we are reading XML is :

Read one XML approximately 1 MB in size kept at unix directory and read XML's using XML stages and write it to Dataset. While reading XML we read XML contents into one column in XML input Stage and write into Dataset.

Environment is :

OS : Linux
Datastage : 8.0.1 on 8 node system.

Please share your experience in reducing the I/O's.

Thanks in Advance.

Vishwanath

Posted: Tue Jul 31, 2012 2:20 am
by ray.wurlod
This really is a "how long is a piece of string?" question. How complex is the XML. Do they really mean "exponentially" and have they figures to prove it?

For some XML there certainly is the potential to generate a large amount of storage requirement, particularly if types are specified as unbounded string.

Posted: Tue Jul 31, 2012 5:45 am
by eostic
Yes. More details are needed....it could be something strange, but it might also be absolutely correct and legitimate, depending on the contents, the number of repeating nodes and subnodes, and the ultimate size of the receiving columns.

Some things to look for and let us know:

The number of "rows" that go into the xmlInput Stage and the number of "rows" that come out.

The datatypes and lengths of the columns in your dataset...and how many columns.

Ernie

Posted: Tue Jul 31, 2012 1:34 pm
by rameshrr3
What is the datatype definition of the Oracle COlumn that holds the XML data ? Are you writing the entire field directly to a seq file /datastet stage ? What is the column length you defined ? If an oracle column is defined as SYS.XMLTYPE it can reach upto 4 GB for each row. Can you parse the XML ? what Oracle function are you using to read teh XML out of Oracle - is it something like

Code: Select all

EXTRACT([ColumnName],'/').getClobVAL() 
or

Code: Select all

XMLTYPE.getClobVal([ColumnName])