Is this possible with Runtime column propagation?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

DSUser2000
Participant
Posts: 42
Joined: Tue Oct 20, 2009 8:36 am

Is this possible with Runtime column propagation?

Post by DSUser2000 »

I have to create lots of jobs which basically just put data from stage to core tables. Only some technical fields need to be added (creation time, jobname) but no fields changed. We would need to do a change capture to check if data has changed and do an update or insert based on this (if nothing has changed, nothing should be done).

Is this possible with runtime column propagation or is using RCP a bad idea?

A problem I got into when trying to build a test job: How can I define which columns are key values? I need this for doing the Update in SQL. Or should I use custom sql here?
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

So define only those columns that you wish to assing as key and Values in the Change capture stage. And let the other columns to propagate.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Is this possible with Runtime column propagation?

Post by kwwilliams »

You can do it, but it could get complex.

For the update you could create a wrapper to one of the datbase stages, and expose the key through a construct like a schema file. You could pass the where clause into the job as a parameter (I have seen negative impact on performance through use of this option).

For your CDC you would also have to use schema files to produce expose the key fields to run through a CDC and produce an output.

It can be done, the question to you is how adept your organization would be at supporting a complex job that would have no design time metadata on the job when issues arise. If you feel your staff is up to the task then it would be worth investigating.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

kumar_s wrote:So define only those columns that you wish to assing as key and Values in the Change capture stage. And let the other columns to propagate. ...
This would only be a reusable job if all of the tables had the same key that needed to be exposed. I believe he is asking for a generic job that can be used regardless of the key value.
DSUser2000
Participant
Posts: 42
Joined: Tue Oct 20, 2009 8:36 am

Post by DSUser2000 »

You are right: The key values are different and could be handled over as Job parameters or so.
Performance should not be an issues because these are rather small tables (<100 entries).
I didn't understand what you meant with the CDC stage. I don't see an option to set a schema file there?
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Expose the column with the column import stage with some generic name such as "keyfield". Then your CDC would use this column as the key and compare all values for change.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

As Keith has pointed out in at least 3 posts this is pretty straight forward. Simply use a column import for the key, use a column import to get the values to compare and you are done. Do not make these paramters to the job - not necessary.

The CDC stage does not accept a schema - the columns need to be present on the link before this stage.

Also, in the column import stage reset the property for "Keep Imported Column" to True this way you do not need to use a complimentary column export to put the column back on the link after the CDC, although you may need a modify later on to drop the generic columns or use the database stage (if that is your target) and set "Drop Unmatched" to True.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Offhand topic:

If you are building a data warehouse, your architecture is odd. Usually warehouses are designed to track changes overtime to gain understanding of a particular dynamic within your organization. If you are overriding all of your data with updates, then you don't really have a basis with which to perform this type of analysis. You can ignore this if not applicable. If you guys are just starting out and you haven't taken tracking of changes overtime into your architecture you may want to ask yourselves some questions about what the business may need in the future.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Re: Is this possible with Runtime column propagation?

Post by FranklinE »

DSUser2000 wrote:Is this possible with runtime column propagation or is using RCP a bad idea?
I was going to post a semi-sarcastic reply to this, but with Keith's "offhand" comment I believe it deserves serious consideration: In any business context where the application is expected to evolve over time, isn't RCP really just a bad idea?

I spent over two years maintaining and enhancing a data warehouse application. RCP was the bane of my existence every step of the way, particularly at 0200 when the support pod called and I needed to diagnose and fix the problem by 0600, with no metadata in production, my workstation (and DS software) 30 miles away and my remote connectivity dead or intermittent at best. :?
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

I was going to post a semi-sarcastic reply to this, but with Keith's "offhand" comment I believe it deserves serious consideration: In any business context where the application is expected to evolve over time, isn't RCP really just a bad idea?
Be very careful when you make such statements to such a large audience with no support for your position. Many of the developers/architects/managers that rely on this site for information might take away something that is absolutely 100% incorrect. RCP, when used correctly, is a wonderful tool and IBM is building more and more infrastructure around it with each release.

Please refrain from posting such diatribe about something that is core and fundamental in the product.

If you are not happy with the outcome of what you built and how to support it then I might suggest that you did not implement in the best possible manner.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

mhester wrote:Be very careful when you make such statements to such a large audience with no support for your position. Many of the developers/architects/managers that rely on this site for information might take away something that is absolutely 100% incorrect. RCP, when used correctly, is a wonderful tool and IBM is building more and more infrastructure around it with each release.

Please refrain from posting such diatribe about something that is core and fundamental in the product.

If you are not happy with the outcome of what you built and how to support it then I might suggest that you did not implement in the best possible manner.
I take your point as constructive. I also respectfully point out that I did not nor do not represent myself as either speaking for IBM, as an expert whose every word must be carefully edited for an audience, or assertively criticizing DataStage in the general sense.

If the moderators of this forum have a different perspective, if my profile setting of "developer" is being contradicted by the content or tone of any of my posts including the one above, then they can edit or delete such posts as they see fit, and would need to explain the action to me privately or not at all as they prefer.

Be well.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

I was going to post a semi-sarcastic reply to this, but with Keith's "offhand" comment I believe it deserves serious consideration: In any business context where the application is expected to evolve over time, isn't RCP really just a bad idea?
If you took the semi-sarcastic response out of your post, then your post would deserve some serious consideration. So in providing useful information I am going to ignore it and answer your question as though it were not present.

RCP used correctly is a very powerful tool that will actually allow your organization to evolve over time without many of the headaches associated with design time information. Adding columns to a table is less intensive from a development side because the entire mapping does not need to be altered. This will allow your organization to be more agile in what you develop. I have seen some wonderful development using RCP.

On the other hand, if your organization is new to DataStage. Then RCP would be a tough concept to digest. Moving from any other ETL tool or hand-coding to DataStage is a difficult adjustment for many people. However, as the organization evolves using RCP should be reconsidered in scenarios where logic is repeated several times (such as with the original poster).

This is no different than utilizing another language, don't move to the advance topics until you have the basics covered. Its obvious you're not comfortable yet, so you might just want to stick to the basics.
DSUser2000
Participant
Posts: 42
Joined: Tue Oct 20, 2009 8:36 am

Post by DSUser2000 »

@kwwilliams:
Thanks a lot. I will try it with column export/import.

Regarding your concerns with overwriting these tables: Well, we have a pretty huge datawarehouse with historization (we've 4 dates for this: editing time and end, effectiveness date and end). However, the tables I'm speaking about in this thread are declarative tables which are manually maintained by the departments which deliver data to us. There's no need for historization in them. However, there are quite often changes (added or deleted columns) which would make lot of work for us if we always have to adjust the jobs.

We're also generally thinking about reducing the number of new jobs (and probably also change old ones) in order to improve maintenance (faster changes and more uniformity of jobs). We've already got more than 3000 jobs and 500 sequences now (most of them are quite complex and so RCP is not an option there) and we're thinking about consolidating some of the simple jobs using RCP because they stick to different development guidelines (different project teams over a longer time).
However, we don't have experience handling RCP and so we don't know if it's more a hazzle than an improvement for us...

Edit: I didn't want to create a dispute here but it's good to here some experience and get the different positions.
Last edited by DSUser2000 on Thu Sep 16, 2010 1:35 pm, edited 2 times in total.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Franklin,

My response was, indeed, meant to be constructive and I am happy that you took it that way. I am a proponent of using the right tool for the job and if it means RCP then I am happy to do so and if not then so be it. I am comfortable with all of the stages on the palette including the restructure stages and have used them quite a bit with various customers and many of them are RCP only stages.

I agree with you that RCP jobs are more difficult to trouble shoot, but I am aware of plans within IBM to tighten this up and make it easier to debug. You can already see some of this with their latest 8.1 release and 8.5 will have even more integration.

There is no need for anyone to remove or edit a post - your original post is free speech and that is a wonderful thing and my response was simply to caution all of us collectively that RCP is not evil and can be a wonderful tool in our toolbox.

Thanks for the reply
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

mhester wrote:Franklin,

My response was, indeed, meant to be constructive and I am happy that you took it that way.
Thanks, Mike. This is not the first time I've created misunderstanding by taking a conversational appoach by including references to my mood not necessarily clear in my writing, and not necessarily appropriate to the topic.

Keith, I apologize. My referring to your "offhand" post, in my thoughts if not in my writing, was out of admiration for your point and not to criticize it.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply