Page 1 of 1

Full outer join gives more output record count than inputs

Posted: Mon May 18, 2015 2:54 pm
by Nagasudheerkumar

Code: Select all

                                B(table) 
                                   | 
A(table)--->Transformer ---->Join Stage(fullouterJoin)------>Transformer -------> Matched                                                                                              
                                                                  |
                                                           Unmatched from B(Tfm)   


Above is my job design:

Table A contains 10 columns in which 5 columns combination gives a key column (ID).
My 5 cols are numbers(Oracle DB) when imported the metadata is decimal,
after combining 5 columns to create(1st Transformer) a char(20) as key for A table.
After Full outer join I am getting more record count from Join Stage.
Table B contains 6 cols in which 1 column is key(ID)
I have partitioned with HASH on key column(ID) from left and Right links. with Sorting.


Below are the details:

Table A count: 5000
Table B count: 696581
After Join Stage: 710928 instead of 701581(A+B).

Both key columns are having duplicate values.

can anyone let me know why the count is coming more from Join Stage after full outer join?

Posted: Mon May 18, 2015 3:55 pm
by ray.wurlod
Because you have duplicate records. Each duplicate is joined to all its counterparts. Worst case, with M records on one input and N on the other, and only one key value, would be an output having M * N records.