Full outer join gives more output record count than inputs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nagasudheerkumar
Premium Member
Premium Member
Posts: 55
Joined: Tue Apr 24, 2007 1:26 am

Full outer join gives more output record count than inputs

Post by Nagasudheerkumar »

Code: Select all

                                B(table) 
                                   | 
A(table)--->Transformer ---->Join Stage(fullouterJoin)------>Transformer -------> Matched                                                                                              
                                                                  |
                                                           Unmatched from B(Tfm)   


Above is my job design:

Table A contains 10 columns in which 5 columns combination gives a key column (ID).
My 5 cols are numbers(Oracle DB) when imported the metadata is decimal,
after combining 5 columns to create(1st Transformer) a char(20) as key for A table.
After Full outer join I am getting more record count from Join Stage.
Table B contains 6 cols in which 1 column is key(ID)
I have partitioned with HASH on key column(ID) from left and Right links. with Sorting.


Below are the details:

Table A count: 5000
Table B count: 696581
After Join Stage: 710928 instead of 701581(A+B).

Both key columns are having duplicate values.

can anyone let me know why the count is coming more from Join Stage after full outer join?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Because you have duplicate records. Each duplicate is joined to all its counterparts. Worst case, with M records on one input and N on the other, and only one key value, would be an output having M * N records.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply