Page 1 of 2

issue while reading 2gb xml file using XML Stage in 8.5FP1

Posted: Fri Sep 09, 2011 9:02 am
by gsingh
HI DS Guru's,

I'm trying to read a 2gb xml file using xml stage in 8.5. I have given Heap size as 2000 and stack size as 500.

The problem is the job is hanging, it reads only 1 row and then the job hangs. can anyone help me out in solving the issue.

Thanks...

Posted: Fri Sep 09, 2011 9:06 am
by chulett
How long did you wait before you decided it was "hung"? Unless the XML has been "pretty printed" or formatted for peoples, don't forget that each file is essentially only one long record.

Posted: Fri Sep 09, 2011 9:11 am
by gsingh
Hi Craig,

I waited for 6 hours and it kept on running and there was no log created after 3 minutes the job has been launched.

Posted: Fri Sep 09, 2011 10:35 am
by chulett
Let's wait and see what Ernie's thoughts on this are, he's most familiar with the 8.5 changes to the XML handling around here.

Posted: Fri Sep 09, 2011 12:36 pm
by eostic
First thing is the most critical --- does the job read the file perfectly with a test document that is 1k?

Ernie

Posted: Sat Sep 10, 2011 10:54 pm
by gsingh
eostic wrote:First thing is the most critical --- does the job read the file perfectly with a test document that is 1k?

Ernie ...
Yes Ernie. The job reads the 1k file without any issues.

please advise what has to be done.

I increased the heap size to 4096 and tried, it didn't work.

Re: issue while reading 2gb xml file using XML Stage in 8.5F

Posted: Sat Sep 10, 2011 11:24 pm
by qt_ky
Any time I see "2 GB" I get suspicious because a lot of operating systems, by default, impose 2 GB limitations on file size and those can cause errors or hangs. Check your OS (ulimit -a) or check with your DataStage and/or Unix administrator to make sure that the right ulimit file size setting is set to unlimited.

Re: issue while reading 2gb xml file using XML Stage in 8.5F

Posted: Sun Sep 11, 2011 7:56 am
by gsingh
qt_ky wrote:Any time I see "2 GB" I get suspicious because a lot of operating systems, by default, impose 2 GB limitations on file size and those can cause errors or hangs. Check your OS (ulimit -a) or check with your DataStage and/or Unix administrator to make sure that the right ulimit file size setting is set to unlimited.
Hi Ernie,

I have used the ulimit -a command on our server and here is the result.
ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) 2097151
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user) unlimited


I see the file size is set to unlimited.

please help!

Posted: Sun Sep 11, 2011 8:10 am
by qt_ky
I am not Ernie, but thanks for the compliment.

Just to be sure because ulimit is such a common problem, have you tried running the "ulimit -a" command as part of the before-job subroutine (ExecSH) and is the output you sent obtained from the job log? I'm just highlighting that now because there can sometimes be differences between the ID you telnet with and the ID that executes the job.

If you do the "ls -l" command on the 2 GB file, what is the exact size of the file in bytes?

Posted: Sun Sep 11, 2011 8:30 am
by chulett
Exactly, you must run the ulimit command from inside the job's environment, not just simply at the command line. If that's what you did.

Posted: Sun Sep 11, 2011 8:47 am
by gsingh
Here is the file size: 2295161299

And am not sure of how to run the command in sub routine. can you please help me in where do i exactly need to use the command.

Thanks

Posted: Sun Sep 11, 2011 9:20 am
by qt_ky
2 GB is 2147483648 bytes, so your file is a bit over 2 GB. This may not be the issue, but it's worth trying to rule it out.

Go into your test job, the one that does not hang, and go into Job Properties. On the General tab under Before-job subroutine choose ExecSH. For the Input Value, enter an OS command: ulimit -a

When you run the test job, you should find the output from the command in the job log.

Posted: Sun Sep 11, 2011 9:57 am
by gsingh
I have tried this and it even shows the file size as UNLIMITED.

Please let me know what can i try next

Thanks..

Posted: Sun Sep 11, 2011 6:11 pm
by eostic
The next thing is to understand exactly how you are implementing the 8.5 xml stage....

What method, in your 1k working example, are you using to pass the document or name of the document to the xml Stage ...there are many ways to do it.....

Please describe the structure of your Job (stages and their order) and how you have configured your Input and xmlParser Steps within the Assembly.

Ernie

Posted: Mon Sep 12, 2011 1:40 am
by gsingh
eostic wrote:The next thing is to understand exactly how you are implementing the 8.5 xml stage...What method, in your 1k working example, are you using to pass the document or name of the document to the xml ...
Hi Ernie,
we have used External source stage to pass the xml document to the xml stage. Here is the command: ls tagetfilepath/filename
The job has following stages:

External source stage--->XML STAGE---->DATASET

In xml stage under usage we have given Heap size as 4096 and stack size as 3000, threads as 4.
IN INput step: One column from external source stage is passed of type varchar of length 9999.
XML_Parser Step: we have used file set option and at validation we have used Minimal validation.

Please advise......

Thanks