DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
mouthou
Participant



Joined: 04 Jul 2004
Posts: 208

Points: 1991

Post Posted: Wed Jan 16, 2019 3:09 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Hi All,

We have about 700-800 existing Teradata tables which required to be migrated to a different Teradata DB server and compare the new data set with the existing set as and when DataStage jobs run in both the Teradata DBs.

Having left with no choice on the DB side as there is no database link etc to use to do a DIFF, interested to explore some options on having a single DataStage job which does the DIFFERENCE logic. Thought to check with the folks here for some clarifications or related inputs. Appreciate any details on this.

- Though we can pass the runtime SQL file to Teradata Connector stage for each table but the major issue is with the metadata. Is there an option we can bypass column definition (otherwise it is as worst as creating 700-800 jobs for this comparision)
- Alternatively can a Teradata Connector stage be used with a runtime schema file for the metadata and use RCP feature like done with Sequential files?
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42998
Location: Denver, CO
Points: 221842

Post Posted: Wed Jan 16, 2019 5:39 am Reply with quote    Back to top    

I don't have any specific thoughts to share this morning other than to note that RCP is not limited to sequential files. And if there's no database link and if that would help simplify this process, have one created. Temporarily, if nothing else.

_________________
-craig

I swore that I would love you to the end of time, so now I'm praying for the end of time to hurry up and arrive
'Cause if I gotta spend another minute with you I don't think that I can really survive
Rate this response:  
Not yet rated
PaulVL



Group memberships:
Premium Members

Joined: 17 Dec 2010
Posts: 1277

Points: 8399

Post Posted: Wed Jan 16, 2019 7:54 am Reply with quote    Back to top    

Do you have to validate each row or does a "sampling" suffice?


You can extract from a table (based upon keys) sort the data and then instead of dumping to a sequential file all of that data, you could output a checksum value. Then just compare both checksums.

Another option is load system #2, then do the checksum extract via SQL from both systems and use DS to compare the sorted results.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54534
Location: Sydney, Australia
Points: 295710

Post Posted: Wed Jan 16, 2019 12:16 pm Reply with quote    Back to top    

Do you really need DataStage for this? Does not Teradata have its own table transfer utilities?

_________________
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42998
Location: Denver, CO
Points: 221842

Post Posted: Wed Jan 16, 2019 12:51 pm Reply with quote    Back to top    

Yeah... wondering much of what has also been posted. It seems like there should be a native solution to this problem but if there isn't, I'd be looking for much more detail on exactly what kind of "comparisons" you need to be doing here.

_________________
-craig

I swore that I would love you to the end of time, so now I'm praying for the end of time to hurry up and arrive
'Cause if I gotta spend another minute with you I don't think that I can really survive
Rate this response:  
Not yet rated
mouthou
Participant



Joined: 04 Jul 2004
Posts: 208

Points: 1991

Post Posted: Wed Jan 16, 2019 9:43 pm Reply with quote    Back to top    

Thanks for everyone's comments. To answer few questions from Ray and Chulett

- Yes I also expected a native solution to do within the DB but there isnt much known yet (It will be 2 different DB production servers and the options of playing around are limited). The kind of comparison is table-to-table to see if there is any difference in loading by the same DataStage migrated job on different server
- Yes we have to validate each row for any slight difference, even if the table contains 2 million rows(so PaulVL's options wont suffice)
- As Ray commented, I also thought to come up something within DB but not known yet. Thought of DataStage as the worst case since the validation job can be run from anywhere pointing to any DB, and there is no need for explicit columns assuming DataStage has RCP feature
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42998
Location: Denver, CO
Points: 221842

Post Posted: Thu Jan 17, 2019 1:12 pm Reply with quote    Back to top    

Still a little lost, but... is this a "one time" process or something you'll need to run repeatedly, say after daily loads against incremental data sets? I think it's the word "runtime" in your subject that's making me ask...

_________________
-craig

I swore that I would love you to the end of time, so now I'm praying for the end of time to hurry up and arrive
'Cause if I gotta spend another minute with you I don't think that I can really survive
Rate this response:  
Not yet rated
mouthou
Participant



Joined: 04 Jul 2004
Posts: 208

Points: 1991

Post Posted: Fri Jan 18, 2019 3:47 am Reply with quote    Back to top    

I think runtime could be in any scenario, be it one time or incremental batch loads. I meant the runtime as to pass the something like tablename when the validation job runs.

This is a temporary validation process which will be run everyday on production for a month till the migration is signed off for correctness.
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42998
Location: Denver, CO
Points: 221842

Post Posted: Fri Jan 18, 2019 5:19 am Reply with quote    Back to top    

So... parallel testing, yes? Meaning for some period of time, you'll be taking the same daily data and loading it into two different database instances by running the same suite of jobs twice, once for each target? Or have you migrated DataStage there as well? Just saw the "the same DataStage migrated job on different server" comment. And then afterwards, you need to somehow compare the results to ensure they are identical. Just want to make sure we're all on the same page with a clear set of requirements.

Regardless, I don't see what about Paul's options "won't suffice". Sure, the sampling perhaps but nothing in what was posted after that was specific to sampling. High level checks could be simple row counts that would reveal an obvious / gross difference. Other than that, you'll need something manageable to work with at the gory details level. I would think that creating a checksum of the data elements in each record, identified by that table's unique key and dumping that for each target would be a manageable way to go. Both sets could be brought together (using DataStage if you like), the keys joined and the checksums compared. Missing keys and/or checksum mismatches could then be further investigated.

This rather than trying to compare the raw records.

_________________
-craig

I swore that I would love you to the end of time, so now I'm praying for the end of time to hurry up and arrive
'Cause if I gotta spend another minute with you I don't think that I can really survive
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours