How does a join stage work?

Manu1 · Post by **Manu1** » Wed Jul 14, 2010 10:17 pm

Hi
Could any one explain me how does a join stage work from conceptual perspective?

If i have some 10 million recs on each i/p how/where does the join stores that data in order to compare/find a match/How it make use of memory and buffer etc?

Please let know if you need more details...

chulett · Post by **chulett** » Wed Jul 14, 2010 10:47 pm

Please post in the correct forum going forward. I've moved your two posts here, any other questions specific to the EE/PX poduct belong here as well.

ray.wurlod · Post by **ray.wurlod** » Wed Jul 14, 2010 11:19 pm

The tsort operator requires sorted inputs. It loads the rows having first join key value from both inputs into memory and performs the join type based on that set of records (those with the first join key value). Any result is pushed onto the output link and the memory is freed. It then loads the rows having second join key value from both inputs into memory, and processes these. And so on until end-of-data is processed. By default 20MB of real memory per partition is allocated, so that scratch disk ought not to be needed.