Page 1 of 1

Folder Stage seems to read file content, only need filenames

Posted: Fri Aug 08, 2014 4:11 am
by mario_j
Hi,
we have a server job that shall read all filenames from a directory and put it into a database table.
Within the folder stage only one column for the filename ist defined.
This job aborts, when in the directory is a file larger then 500 MB or when all files together are bigger then 500 MB.
It does not matter, what I am doing within the job, without sorting, transform and only writing to an hashed file the folder can be a little bit bigger than when writing to db stage, but it aborts with uvmalloc() memory exceeded error.
If there the files within the directory are smaller than 500 MB jobs are working fine.
If I delete some entries in the 500 MB file, so that it is smaller, the job works.

I can rebiuld it in parallel with external source using ls command, then it works. But using parallel job is not likely, because of sevrer routines that are used.

First question: does the folder stage really read all file content first, even I only use one column for filename?

Second: is there a parameter that limits memory for reading files or for the folder stage?

Thanks,
Mario

Re: Folder Stage seems to read file content while only want

Posted: Fri Aug 08, 2014 9:52 am
by chulett
mario_j wrote:does the folder stage really read all file content first, even I only use one column for filename?
From what I recall and what you've documented - yes. If literally all you need are the filenames, I'd suggest an ExecSH command Before Job to list the filenames to a flat file and then use that file as your source in the job.

Posted: Thu Aug 28, 2014 3:57 pm
by chulett
Did you ever get this resolved? Stumbled back on this today and just wanted to add that you could also use the Filter option of the Sequential File stage to feed in just the filenames using something like an "ls -1 <pattern>" O/S command.

Posted: Sun Aug 31, 2014 2:47 pm
by mario_j
we did a workaround and use command stage instead of folder.
We also made a pmr, but they only stated out, what we already know and could not help.
Maybe problems with blocksize on our windows server. The process cannot allocate a big enough block.