Loading a full file into a field?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
simc
Premium Member
Premium Member
Posts: 16
Joined: Wed Jul 22, 2015 12:10 pm

Loading a full file into a field?

Post by simc »

So I have a requirement to load a full file into the database.

I've tried a couple different thoughts, first was to cat the file in a command stage, and use the result as a variable to insert into the database, but it seems like there's size limitation to that variable.

Second thing I tried, was to read the file with a seq file stage, but I can't get it to read the whole file into one rec, one field... it reads each line into a record, so I tried with that updating the one row and appending each line of the file... it works, but takes like 4 minutes a file.

So my question, is does anyone have an elegant solution to read in a full file and insert it into one record.

Thanks from a relative newbie!
Charles
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Curious what kind of field we're talking about here, a CLOB?
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Could you add the following please?

Database type and the definition of the "field" that is being used to store the data.

Maximum expected file size (if you have it) - is it megabytes, terabytes, etc.

Format for the original file - does it contain fixed length records, variable length, Unix or DOS format?

More detail is better!
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
simc
Premium Member
Premium Member
Posts: 16
Joined: Wed Jul 22, 2015 12:10 pm

Post by simc »

It's an oracle database, 12c, the field type is CLOB.

The file format is simple text, and the samples I have are mostly under 200kb with a couple in the 1.5mb range. can't see them being more than 3-4mb.

The format is fixed length records, DOS format I believe.

Heres a quick sample, with the formatting a little off:

Code: Select all

220160101CLO CLO 1ADR 1 1       24700        3705        3705           0           0           0           0       45280
220160101CLO CLO 1ADR 1 2        2800         420         420           0           0           0           0        1915
220160101CLO CLO 1ADR 1 3        2500         375         375           0           0           0           0        1915
220160101CLO CLO 1ADR 1 5       21900        5212        5212           0           0           0           0        4175
220160101CLO CLO 1ADR 1 6       13800        3558        3558           0           0           0           0        7695
220160101CLO CLO 1ADR 1 7           0           0           0           0           0           0           0           0
220160101CLO CLO 1ADR 1 8           0           0           0           0           0           0           0           0
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The forum software removes all 'extra' whitespace, hence the formatting issue. Wrap anything you want to preserve in

Code: Select all

[/b] tags, something I did for your post.

FYI, some quick searches out in the wild turn up several "Ask Tom" questions similar to this, all of which were directed to treat the file as an external table and use PL/SQL to load it into the CLOB. Of course.  :wink: Not quite sure how you would turn that into a DataStage job off the top of my head.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

this is not as easy as I thought it would be.

Can you pipe the output of cat (unix) or type (windows) into it?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

This may be a silly question, but your file seems very much like structured data to me... why do you want to load the FILE into the database when the DATA into a TABLE would be best?

You could add a few columns for filename and timestamps if needed... but I think a regular table with fields 13 (+?) fields would suit you best.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could try setting the line terminator to None (or to something that will never occur in the data) in the Sequential File stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Or use a Server Job. The Folder Stage will easily pull the entire contents into a single column "as is". For files of a few hundred meg or less, it works perfect every time.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Good catch, Ernie.
One can encapsulate the Folder stage in a Server Shared Container and use that in a parallel job, if that's what you need.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

PaulVL wrote:This may be a silly question, but your file seems very much like structured data to me... why do you want to load the FILE into the database when the DATA into a TABLE would be best?

You could add a few columns for filename and timestamps if needed... but I think a regular table with fields 13 (+?) fields would suit you best.
Version/source control for code is nothing but a database of files. There are times when that sort of thing can be useful. We archive a few generated files to DB for various reasons. If nothing else it beats having a massive archive folder and makes pulling by date/time easier and deletion of outdated 'files' easier and backup is taken care of, etc. I guess all that can be summed up by "convenience".
Post Reply