Always load sources in varchar only schemas

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
IASAQ
Premium Member
Premium Member
Posts: 31
Joined: Wed May 04, 2016 11:07 am
Location: Montréal

Always load sources in varchar only schemas

Post by IASAQ »

I'm having a debate with a colleague concerning the way to load sources in a schema. He likes to hunt for datatypes and lengths when building his source schemas before processing and I like to load my source in schemas that have only varchar datatypes before processing. Of course, the data will be transformed back to the proper datatypes before reaching the target.

Do you see any cons with this way of working with varchar only schemas when processing data? If it's not clear, please let me know and I'll clarify the situation.

Thanks
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I can think of several reasons not to use VarChar only schemas.

Every time a job has to change a column from one datatype to another there is a large amount of expense involved in terms of CPU utilization. That gets multiplied for every record that flows through the job. If you are flipping fields back and forth between datatypes then that means your jobs will run slower.

By keeping fields in their native formats, jobs will notify you of content errors (for example storing a character in an integer) as soon as it occurs, instead of downstream at the end of the process.

And as a best practice, we should only change things when we have to change them. By adding unnecessary changes to data, you add more chances to make a mistake.

I'm also not certain why your colleague "hunts" for data specifications. Those can be easily imported from most database tables. Even when designing against early specifications where the tables aren't built, best practice is that data types should be specified, even if they have to be tweaked later.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For what it's worth, the only time I've ever had a need for a "varchar only schema" is when we were getting crap data from older systems via flat files. That allowed all of the data to actually load and then we could perform whatever validations were needed before cleaning / converting it and moving it deeper into the database, that or kicking it back. Things like dates of "01/33/2012" were some of my favorites. :wink:

Haven't seen any such need when going from database to database that I can recall.
-craig

"You can never have too many knives" -- Logan Nine Fingers
IASAQ
Premium Member
Premium Member
Posts: 31
Joined: Wed May 04, 2016 11:07 am
Location: Montréal

Post by IASAQ »

asorrell wrote:I'm also not certain why your colleague "hunts" for data specifications. Those can be easily imported from most database tables. Even when designing against early specifications where the tables aren't built, best practice is that data types should be specified, even if they have to be tweaked later.
Because a majority of the queries we have to work with in our Data Manager to Datastage migration don't have simple "select x from y" constructs and the right datatypes have to be guessed, especially with numeric info. With Data Manager, I never had to deal with source or target datatypes, the conversion was properly handled internally.

The main reason why I use varchar only schema at the source is that it's consistent and the conversion functions work well. I'll have to think about the replies made so far.

Thanks for your input, it's appreciated.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Ah - ok. So your site is a recent Cognos conversion. I didn't realize that from your initial post. Yes, there will be some significant changes from Data Manager. Welcome to DataStage, even if your move was a probably a bit involuntary!

Though both of these are a bit dated now, you might want to take a look at two of the IBM Redbooks, mainly because they have more examples than the standard web-based documentation.

IBM InfoSphere DataStage Data Flow and Job Design
http://www.redbooks.ibm.com/abstracts/sg247576.html

InfoSphere DataStage Parallel Framework Standard Practices
http://www.redbooks.ibm.com/abstracts/sg247830.html

There are a number of Redbooks available on related topics (search for DataStage and Information Server), so check for others that may also be relevant to your site.

In the older Redbooks some "retired" stages may be shown that have now been replaced with better stages (for example enterprise stages have been replaced with database connectors). However the examples are still useful and should help answer questions when they pop up.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply