reconciliation of source data and target data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
yuiduan
Participant
Posts: 15
Joined: Tue Oct 26, 2004 1:06 am

reconciliation of source data and target data

Post by yuiduan »

Hi
We use datastage to load data from DB2 to Oracle incrementally. From time to time, we compare db2 tables with their oracle equevalence to make sure they are identical. My approach is convert a db2 table to a flat file, convert its equivalant oracle table to another falt file, then use some compare tool (e.g. TOAD compare function) to compare two flat files. There are hundreds of tables to be compared. I found the comparing process I used is too manual and painful.
I am thinking of another way: use the db2 and oracle tables as inputs to a Universe stage to create/populate two universe tables, and then use universe sql to compare the two tables, store the comparison reault to a third table. I don't know if the universe database has set operation (such as minus, union etc).
What do you think about the above idea?
If it is dorable, where can I find Universe database syntax documents?
Any other options you can suggest?

Thanks a lot.
Yui
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

I am not sure whether you can do this in Universe or not. But I used datastage to do this reconcillation. I created a simple multi instance job which receives two source files and the first column being key column in both the file and with the same metadata. I use one file as hash lookup and another one as main file and do comparision for all the columns. If it doesn't match then it puts to a reject file. Also if records not found in either of the source can be reported to the reject file.

What you need to do is to feed two source file from your two tables to job and run. The query used in those two tables must have all the columns same in the right order.

I am using this component and finding it very useful for me in doing those sort of reconcillation.

Cheers
Siva
ecclesr
Premium Member
Premium Member
Posts: 260
Joined: Sat Apr 05, 2003 7:12 pm
Location: Australia

Post by ecclesr »

The following Universe example will produce the same equivalent result as Oracle using MINUS by using the NOT EXISTS keywords with a sub query eg to request billing address for all customers who orders have been satisfied

>SELECT BILL.TO FROM CUSTOMERS
SQL+WHERE NOT EXISTS (SELECT * FROM ORDERS
SQL+WHERE CUST.NO = CUSTOMERS.CUST.NO);

The Oracle equivalent using MINUS

SELECT BILL.TO FROM CUSTOMERS
MINUS
(SELECT BILL.TO FROM ORDERS
WHERE CUST.NO = CUSTOMERS.CUST.NO);


Ross
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UniVerse has standard set operations. These are UNION, INTERSECT and DIFFERENCE.

Sorry to disappoint the Oraclers, but MINUS is Oracle only.

There are obviously two difference DIFFERENCE results, but the question ought to return 0 rows if the tables are equivalent.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
siddharthkaushik
Participant
Posts: 20
Joined: Thu Nov 27, 2003 3:45 am
Location: New Delhi, India

Sub Query execution in UniVerse

Post by siddharthkaushik »

Can UniVerse evaluate such a query:

SELECT A.BU_KEY,A.PERIOD_KEY,2 as UOM_KEY,CASE WHEN B.SUBS_VAL=0 THEN 0 ELSE (A.REV_VAL/B.SUBS_VAL*1000000) END AS ARPU_VAL FROM (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUM(REV_VAL) as REV_VAL FROM EISDWH.REVENUE_FACT WHERE REV_KEY in(167,170,172) AND IS_LATEST='1' AND SCENARIO_KEY=1 group by BU_KEY,PERIOD_KEY,UOM_KEY) A, (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUBS_VAL FROM EISDWH.SUBS_FACT WHERE SUBS_KEY=24 AND RANGE_KEY=0 AND PROD_KEY=3 and NETWORK_OPTR_KEY=9 AND IS_LATEST='1' AND SCENARIO_KEY=1) B WHERE A.BU_KEY=B.BU_KEY AND A.PERIOD_KEY=B.PERIOD_KEY;

I have been trying to make this query work for sometime now with few changes, but no success as yet....

I have unloaded two tables (SUBS_FACT and REVENUE_FACT) in respective Hashed files. Then I am trying to evaluate revenue per subscriber based on conditions. The query works in ODBC Stage, but the performance is not good. In order to boost the performance, I am trying this route, but still stuck with query not working in UniVerse.

Thanks,
Siddharth
I would more readily fail completely in something that I truley beleive in, rather than to succeed overwhelmingly in something that goes against the ideals I have set for myself.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You would use EVAL rather than WHEN construct but otherwise UniVerse SQL can do that kind of query. Indexing the constraining columns would help, as it would in any database.

It would be more efficient in UniVerse to use Select Lists on which you can perform in-memory set operations (UNION, INTERSECTION, DIFFERENCE) using the MERGE.LIST command.

You can not use a Hashed File stage for any kind of lookup but "=". Specifically in this case a Hashed File stage does not support an IN operator in a lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I would use crc32. Do a search on how to use this function. I would think this would be the fastest way to compare all fields.
Mamu Kim
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

Yui,

can you just confirm for me that, as per your message description, your DataStage server is a Windows server and not a UNIX server. If it is a Windows server I don't have an alternative to offer but if it is a UNIX server, there is another option that I can suggest.

David
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Suggest away, David. You get MKS Toolkit with DataStage 7.5 on Windows. So you can do UNIX stuff.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

I was trying not to presume about any flavour of UNIX tools on a Windows server since that is not necessarily a standard scenario. But suggest away I will ...

There could be a fairly straight forward UNIX solution on the premise that an export from both Oracle and DB2 of the respective tables can be achieved on the command line (e.g. commands piped to PL/SQL).

The solution outline would be:

Code: Select all

  for each table
    export oracle table to flat file
    export DB2 table to flat file
    compare the files producing two files;
      one that holds data that is in one file but not in the other
      one that holds data in the other file but not in the first
So a pseudo-UNIX ksh script to do this for three tables would be:

Code: Select all

  for tabname in table1 table2 table3
  do
    echo $tabname
    command line to invoke Oracle tool to export table $tabname.* to file1
    command line to invoke DB2 tool to export table $tabname.* to file2
    sort -o $tabname.file1 file1
    sort -o $tabname.file2 file2
    comm -13 $tabname.file1 $tabname.file2 > $tabname.only_in_file2
    comm -23 $tabname.file1 $tabname.file2 > $tabname.only_in_file1
  done
I am ignorant of the command required for the respective tools, so the approach does depends on the existence of such tools. The sort step is included as using an ORDER BY clause on an export may require knowledge of all the columns in a particular table whereas using the UNIX sort command circumvents the need for this. You pay your money and you take your choice.

The UNIX comm command ("man comm") allows a comparison to be made between two files and records common to both, only in one file or only in the other file to be written to standard output.

Hopefully this is enough of an overview for you to consider whether or not it is an option that would work for you.

David
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Ray,

Should the MKS toolkit be automatically installed with the DataStage server?

I'm running 7.5.1 on Windows and I can't seem to find an MKS directory.
Jim Paradies
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Aarrgh, it's DataStage EE. Sorry about that.

Anyway, you can get MKS Toolkit or CYGWIN or any of those UNIX emulators without having to get them with DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Maybe you should get a price on a WebSphere Information Integrator license, this will let you write SQL to compare the two tables across database platforms. It also gives you a lot more options for your DataStage jobs such as unstructured data sources and cross database query sources.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

CRC32
Mamu Kim
Post Reply