DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
kinu008



Group memberships:
Premium Members

Joined: 16 Mar 2007
Posts: 36

Points: 417

Post Posted: Wed Oct 11, 2017 7:05 am Reply with quote    Back to top    

DataStage® Release: 9x
Job Type: Parallel
OS: Unix
Hi all,
Recently we are getting requests to load hundreds of flat files into our Netezza database using DataStage. I have used RCP feature for SQL Server and Oracle sources and that DataStage in those cases automatically generated and created the tables on the fly in target Netezza. However, for flat files sources, RCP works differently as the tool is expecting predefined schema to be present. Since we are loading hundreds of files it is not practical to create schema definitions manually as it is very time consuming.

So just checking with the experts here to see if there is a way to automatically create table with proper data types plus load the flat file data using DataStage? If DataStage can't do that then are there any other techniques using AIX/PERL to load the files?

Thank you, Kiran
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 273

Points: 2756

Post Posted: Wed Oct 11, 2017 8:42 am Reply with quote    Back to top    

what DO you have? Does the flat file schema match a target or other database that has meta-data you could import?

if it matches the target you can export the target table def and use your perl or other language to parse that and crank out a schema file.
Rate this response:  
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42272
Location: Denver, CO
Points: 217063

Post Posted: Wed Oct 11, 2017 9:14 am Reply with quote    Back to top    

So... you've got hundreds of files to load, all with potentially different metadata, and for each one you need to create - on the fly - a target table to store each in? Really? If not, can you be as specific as possible with your requirements, please? Not sure how one would go about trying to automatically infer "proper data types" from flat files with all of the perils that would bring to the table. With any tool. Confused

_________________
-craig

Watch out where the huskies go and don't you eat that yellow snow
Rate this response:  
kinu008



Group memberships:
Premium Members

Joined: 16 Mar 2007
Posts: 36

Points: 417

Post Posted: Wed Oct 11, 2017 10:58 am Reply with quote    Back to top    

UCDI,

There is no target table.. I have to create one using the file.
Rate this response:  
kinu008



Group memberships:
Premium Members

Joined: 16 Mar 2007
Posts: 36

Points: 417

Post Posted: Wed Oct 11, 2017 10:58 am Reply with quote    Back to top    

chulett,

it's census data (age, race, ethnicity, education level, etc..) for continental US and the data is spread over in 90 files..

There is a tool called Aginity (SQL query tool for Netezza), which analyzes the data and comes up proper table definitions and then creates the table on the fly. Unfortunately it can't load 10's of files and i have to manually choose one file at a time. So I was wondering if DataStage has a similar feature that I am missing or is there any other way to accomplish the task?
Rate this response:  
UCDI



Group memberships:
Premium Members

Joined: 21 Mar 2016
Posts: 273

Points: 2756

Post Posted: Wed Oct 11, 2017 11:53 am Reply with quote    Back to top    

I think IA can do this sort of, and I think IA can read files instead of databases, but not 100% sure as I don't use IA.

It would be fairly trivial to do this in an external program.
I mean, there are only "really" 4 types:

integer (size, irrelevant, use 64 bit int)
decimal (size irrelevant, use IEEE 'double' 64 bit type or 80 bit type if you like)
string
date (hardest one to infer if various formats used, easy if a consistent format is used).

if the data is high quality, this will work. If it is low quality, everything will end up being a string because each column will have incompatible data for the other types and default it to string.
Rate this response:  
kinu008



Group memberships:
Premium Members

Joined: 16 Mar 2007
Posts: 36

Points: 417

Post Posted: Fri Oct 13, 2017 7:07 am Reply with quote    Back to top    

Thanks UCDI.. We bought information analyzer and we are in the process of installing and configuring but it will take some time before we fully utilize IA.

Besides IA, anyone has any other ideas? Does DataStage has the feature to analyze data and determine the data types for flat file sources?
Rate this response:  
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42272
Location: Denver, CO
Points: 217063

Post Posted: Fri Oct 13, 2017 7:19 am Reply with quote    Back to top    

Automatically? No.

_________________
-craig

Watch out where the huskies go and don't you eat that yellow snow
Rate this response:  
thompsonp



Group memberships:
Premium Members

Joined: 01 Mar 2005
Posts: 187

Points: 1709

Post Posted: Fri Oct 13, 2017 8:22 am Reply with quote    Back to top    

What is the source of the census data?
Doesn't the source / supplier have meta data for each one? If not how are you determining what each one contains?
Rate this response:  
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours