Hashed File in a Server Shared Container for read access PX

jmessiha · Post by **jmessiha** » Wed Dec 15, 2004 12:43 pm

I created a hashed file in a server job that I want to be able to lookup against from within a parallel job after it is created, so I put the hashed file into a server shared container and then put the server shared container into my parallel job and it seems like it will work. So far so good.

My question is while in parallel, accessing (reading only) the hashed file inside the server shared container, will my parallel job inadvertently run in sequential mode? What problems should I be aware of?

Thanks in advance

trokosz · Post by **trokosz** » Wed Dec 15, 2004 8:39 pm

Server shared containers run in serial but if connected to other stages outside of the shared container will run in parallel as designed. Thats about it...

jmessiha · Post by **jmessiha** » Wed Dec 15, 2004 9:01 pm

So you're saying that there will be no bottle-neck while performing a parallel lookup on a hashed file inside of a shared server container?

trokosz · Post by **trokosz** » Wed Dec 15, 2004 9:37 pm

Your not do a parallel lookup in a shared container at all with a hash file...its serial....you take the performance hit you would have taken in a server job....or you gain nothing from the parallel job in this area....hash files are not in parallel jobs so you have no choice but a server job or shared container faking a server job...

jmessiha · Post by **jmessiha** » Wed Dec 15, 2004 11:13 pm

Are you saying that each individual partition in the parallel stream will have to wait its turn to read the hashed file? They cannot read the same file at the same time?

trokosz · Post by **trokosz** » Thu Dec 16, 2004 12:32 am

jmessiha · Post by **jmessiha** » Thu Dec 16, 2004 1:20 pm

I was thinking that because the parallel partitions were only reading the hashed file, it wouldn't be a problem (as opposed to writing to it). Are you sure that it will bottle-neck there? If so, what recommendations for alternative solutions might you have?

T42 · Post by **T42** » Sun Dec 19, 2004 12:48 pm

Land it to a sequential file in server. In PX, pull it and use a regular lookup stage.

jmessiha · Post by **jmessiha** » Sun Dec 19, 2004 10:18 pm

T42 wrote:Land it to a sequential file in server. In PX, pull it and use a regular lookup stage.

Won't that result in the same or poorer performance than the hashed file in a server container?

Eric · Post by **Eric** » Mon Dec 20, 2004 6:40 am

I think the standard lookup options in PX will read the lookup table once into memory and then perform the lookups (from each node) against the data in memory.

T42 · Post by **T42** » Mon Dec 27, 2004 6:38 pm

jmessiha wrote:
T42 wrote:Land it to a sequential file in server. In PX, pull it and use a regular lookup stage.
Won't that result in the same or poorer performance than the hashed file in a server container?

It will out-perform the hash file within a server container, especially if you have a very decent disk I/O throughput, and assuming that your lookup data is not of a significant size (smaller than your physical memory to a certain degree.)

Remember, using a lookup stage can be done in parallel. Throw a lot of nodes at the job (enough for your server to hum nicely), ensure that the lookup stage is partitioned as auto (or entire), and you should notice a significant performance boost right after you finish loading the lookup data.