Automate loading flat files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Automate loading flat files

Post by kinu008 »

Hi all,
Recently we are getting requests to load hundreds of flat files into our Netezza database using DataStage. I have used RCP feature for SQL Server and Oracle sources and that DataStage in those cases automatically generated and created the tables on the fly in target Netezza. However, for flat files sources, RCP works differently as the tool is expecting predefined schema to be present. Since we are loading hundreds of files it is not practical to create schema definitions manually as it is very time consuming.

So just checking with the experts here to see if there is a way to automatically create table with proper data types plus load the flat file data using DataStage? If DataStage can't do that then are there any other techniques using AIX/PERL to load the files?

Thank you, Kiran
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

what DO you have? Does the flat file schema match a target or other database that has meta-data you could import?

if it matches the target you can export the target table def and use your perl or other language to parse that and crank out a schema file.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... you've got hundreds of files to load, all with potentially different metadata, and for each one you need to create - on the fly - a target table to store each in? Really? If not, can you be as specific as possible with your requirements, please? Not sure how one would go about trying to automatically infer "proper data types" from flat files with all of the perils that would bring to the table. With any tool. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Post by kinu008 »

UCDI,

There is no target table.. I have to create one using the file.
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Post by kinu008 »

chulett,

it's census data (age, race, ethnicity, education level, etc..) for continental US and the data is spread over in 90 files..

There is a tool called Aginity (SQL query tool for Netezza), which analyzes the data and comes up proper table definitions and then creates the table on the fly. Unfortunately it can't load 10's of files and i have to manually choose one file at a time. So I was wondering if DataStage has a similar feature that I am missing or is there any other way to accomplish the task?
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

I think IA can do this sort of, and I think IA can read files instead of databases, but not 100% sure as I don't use IA.

It would be fairly trivial to do this in an external program.
I mean, there are only "really" 4 types:

integer (size, irrelevant, use 64 bit int)
decimal (size irrelevant, use IEEE 'double' 64 bit type or 80 bit type if you like)
string
date (hardest one to infer if various formats used, easy if a consistent format is used).

if the data is high quality, this will work. If it is low quality, everything will end up being a string because each column will have incompatible data for the other types and default it to string.
kinu008
Premium Member
Premium Member
Posts: 36
Joined: Fri Mar 16, 2007 1:35 pm

Post by kinu008 »

Thanks UCDI.. We bought information analyzer and we are in the process of installing and configuring but it will take some time before we fully utilize IA.

Besides IA, anyone has any other ideas? Does DataStage has the feature to analyze data and determine the data types for flat file sources?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Automatically? No.
-craig

"You can never have too many knives" -- Logan Nine Fingers
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

What is the source of the census data?
Doesn't the source / supplier have meta data for each one? If not how are you determining what each one contains?
Post Reply