Runtime comparision of tables using TD connector stage?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Runtime comparision of tables using TD connector stage?

Post by mouthou »

Hi All,

We have about 700-800 existing Teradata tables which required to be migrated to a different Teradata DB server and compare the new data set with the existing set as and when DataStage jobs run in both the Teradata DBs.

Having left with no choice on the DB side as there is no database link etc to use to do a DIFF, interested to explore some options on having a single DataStage job which does the DIFFERENCE logic. Thought to check with the folks here for some clarifications or related inputs. Appreciate any details on this.

- Though we can pass the runtime SQL file to Teradata Connector stage for each table but the major issue is with the metadata. Is there an option we can bypass column definition (otherwise it is as worst as creating 700-800 jobs for this comparision)
- Alternatively can a Teradata Connector stage be used with a runtime schema file for the metadata and use RCP feature like done with Sequential files?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't have any specific thoughts to share this morning other than to note that RCP is not limited to sequential files. And if there's no database link and if that would help simplify this process, have one created. Temporarily, if nothing else.
-craig

"You can never have too many knives" -- Logan Nine Fingers
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Do you have to validate each row or does a "sampling" suffice?


You can extract from a table (based upon keys) sort the data and then instead of dumping to a sequential file all of that data, you could output a checksum value. Then just compare both checksums.

Another option is load system #2, then do the checksum extract via SQL from both systems and use DS to compare the sorted results.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you really need DataStage for this? Does not Teradata have its own table transfer utilities?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yeah... wondering much of what has also been posted. It seems like there should be a native solution to this problem but if there isn't, I'd be looking for much more detail on exactly what kind of "comparisons" you need to be doing here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Post by mouthou »

Thanks for everyone's comments. To answer few questions from Ray and Chulett

- Yes I also expected a native solution to do within the DB but there isnt much known yet (It will be 2 different DB production servers and the options of playing around are limited). The kind of comparison is table-to-table to see if there is any difference in loading by the same DataStage migrated job on different server
- Yes we have to validate each row for any slight difference, even if the table contains 2 million rows(so PaulVL's options wont suffice)
- As Ray commented, I also thought to come up something within DB but not known yet. Thought of DataStage as the worst case since the validation job can be run from anywhere pointing to any DB, and there is no need for explicit columns assuming DataStage has RCP feature
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still a little lost, but... is this a "one time" process or something you'll need to run repeatedly, say after daily loads against incremental data sets? I think it's the word "runtime" in your subject that's making me ask...
-craig

"You can never have too many knives" -- Logan Nine Fingers
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Post by mouthou »

I think runtime could be in any scenario, be it one time or incremental batch loads. I meant the runtime as to pass the something like tablename when the validation job runs.

This is a temporary validation process which will be run everyday on production for a month till the migration is signed off for correctness.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... parallel testing, yes? Meaning for some period of time, you'll be taking the same daily data and loading it into two different database instances by running the same suite of jobs twice, once for each target? Or have you migrated DataStage there as well? Just saw the "the same DataStage migrated job on different server" comment. And then afterwards, you need to somehow compare the results to ensure they are identical. Just want to make sure we're all on the same page with a clear set of requirements.

Regardless, I don't see what about Paul's options "won't suffice". Sure, the sampling perhaps but nothing in what was posted after that was specific to sampling. High level checks could be simple row counts that would reveal an obvious / gross difference. Other than that, you'll need something manageable to work with at the gory details level. I would think that creating a checksum of the data elements in each record, identified by that table's unique key and dumping that for each target would be a manageable way to go. Both sets could be brought together (using DataStage if you like), the keys joined and the checksums compared. Missing keys and/or checksum mismatches could then be further investigated.

This rather than trying to compare the raw records.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply