Add a new column without using Transformer Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dwh_user
Premium Member
Premium Member
Posts: 14
Joined: Thu Oct 18, 2018 8:08 pm
Location: Sydney

Add a new column without using Transformer Stage

Post by dwh_user »

Hi,
I have a requirement to read data from 7 different tables using a ODBC stage and apply some logic. Table structure is different for each table so want to add new columns with NULL value. I don't want to use Transformer stage as I need to use 7 transformers.

It would be great help, If anyone have any idea to add extra columns with NULL value without using a Transformer.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Without getting into why you would need seven transformers for this, they're not how you "add a new column" without a derivation, without a value. If you really want to use a stage for this then the Modify stage would be the answer, I would think. Better yet, let the database handle the NULL as it should be the default values for the columns in question (yes?) so don't do anything - don't even mention them. Your job will insert the proper derived values into the columns that require them and any not mentioned will default. Easy peasy.

Now, I may be cornfusing Informatica and DataStage but we always keep all columns in our metadata and simply don't "hook them up" so they're not included in the generated DML and thus default. From what I recall of the DS way, simply omitting them from the metadata you use in jobs like this would accomplish the same thing.

My two cents.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dwh_user
Premium Member
Premium Member
Posts: 14
Joined: Thu Oct 18, 2018 8:08 pm
Location: Sydney

Post by dwh_user »

The reason behind to make same metadata for all 7 tables is to Funnel the data and bring it into one link.

I can't do this in source query by selecting "NULL as ColumnName" because this is not allowed as per our code standards.

Also I tried to handle this with database, but our environment is using ODBC stage to load the data into database, And when we give less number of columns in source than the target table, its throwing error with message "All columns of the table must be specified in the INSERT BULK statement".

If I change to not do Bulk Insert via change the Array Size to 1 then job is taking 7-8 mins to load only 20-30K records.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Okay... always best to include relevant details right off the bat. A few quick thoughts:

1. That's a bit of an odd "code standard" IMHO but okay
2. SQL Server, I assume, since Windows and ODBC
3. What's your target volume? How often will this run?
4. Why are you setting the Array Size to 1? Conventional (non-bulk) inserts certainly don't require that unless perhaps you have LOBs to handle?

Again, relevant details help.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And since you didn't mention it, Modify stage? Pretty sure that should be an option here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

what do you already have? Several stages will let you inject a column at will in 11.x. You can also add them to your extract if its from a database, just add a column as type & name to the select (does not have to be null, are you allowed to inject a non-null column?). Transformers used to be a performance hit but not in 11... that is not a bad way, not sure why you want to avoid it.

you could also make a sequence job that calls the same parallel job 7 times with different parameters and builds datasets. The parallel job can write it into funnel format, and one final job reads/funnels/loads. If your swap disk is fast, there is no performance hit here, so this is useful if your setup includes a powerful swap area.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'm thinking Column Generator stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply