XML File size limit

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pillip
Premium Member
Premium Member
Posts: 50
Joined: Thu Dec 10, 2009 10:43 am

XML File size limit

Post by pillip »

Hi,

Do we have any limit on the size of the XML file which can be written from Datastage.


Thanks,
Prasanna
dsdevper
Premium Member
Premium Member
Posts: 86
Joined: Tue Aug 19, 2008 9:31 am

Post by dsdevper »

I tried to get that answer from IBM support.But they didn't say the exact size but, They Said 100 MB is preferable size.

I tried reading 250 Mb file, i was successful, But higher than that job was terminating abnormally.

Here is the Message i got from IBM

"" Found that the XML Input stage requires, on average, 5-7 times the size of the file in memory to process the document. The memory usage is based on
the actual structure and data within the document, and on the XPATH that defined in the job.
There is a risk of random job failures with large files, even when the memory usage is optimally configured.
The recommended solution is the input XML files should be kept as small as possible. The guideline is 100 MB or less for each file. ""
pillip
Premium Member
Premium Member
Posts: 50
Joined: Thu Dec 10, 2009 10:43 am

Post by pillip »

I am using XML output stage to create XMLs. For 100 MB what would be the approximate count of the records which can be put in it.


Thanks
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

There are too many variables. Size of tags, size of data...no real way to easily predict it. I've done as much as 500M in closed tests --- this is why the conservative recommendation that you received is 100M.

On targeting though, you can cheat. As you have mentioned in prior threads, just create single nodes and send those "out" of xmlOutput and into a sequential stage.....build the ultimate document in there by piling on thousands of nodes. I've created multi-gigabyte xml documents that way.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

There are too many variables. Size of tags, size of data...no real way to easily predict it. I've done as much as 500M in closed tests --- this is why the conservative recommendation that you received is 100M.

On targeting though, you can cheat. As you have mentioned in prior threads, just create single nodes and send those "out" of xmlOutput and into a sequential stage.....build the ultimate document in there by piling on thousands of nodes. I've created multi-gigabyte xml documents that way.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Nobody wants to process LARGE xml files. Lots and lots of small ones, sure, but a small number of biggies? No thank you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I recall an email sent back by one of the first Ardent techo people to go into China - "don't worry about 2GB as a file size, guys, here 2GB is a single transaction!"

The good news is that the next version of DataStage will have support for much larger XML file sizes, provided your hardware has the requisite memory available.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, maybe just no-one in their right mind. :wink:

I've told this before, but when sending XML to Google, they have a hard 100MB limit file size. 1 byte over on any file and the whole collection gets rejected. They didn't care what quantity you sent, but were quite emphatic about the individual sizes.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pillip
Premium Member
Premium Member
Posts: 50
Joined: Thu Dec 10, 2009 10:43 am

Post by pillip »

Do we have any option on the XML output stage to spilt the files based on the number of records.



Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes. Specifically, you have the Trigger Column option which splits the file when the value there changes. It's up to you to make it change at the right time, in this case every X records. That's how we managed things with Google, picked a sufficiently small enough number that we knew it would 'never' be too big.
-craig

"You can never have too many knives" -- Logan Nine Fingers
arunkumarmm
Participant
Posts: 246
Joined: Mon Jun 30, 2008 3:22 am
Location: New York
Contact:

Re: XML File size limit

Post by arunkumarmm »

pillip wrote:Hi,

Do we have any limit on the size of the XML file which can be written from Datastage.


Thanks,
Prasanna
Not much sure about the size limit but have created a file with >500MB. Only thing is that the job ran for 4 hours.
Post Reply