Hashed File in a Server Shared Container for read access PX

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jmessiha
Participant
Posts: 21
Joined: Fri Nov 12, 2004 9:48 am

Hashed File in a Server Shared Container for read access PX

Post by jmessiha »

I created a hashed file in a server job that I want to be able to lookup against from within a parallel job after it is created, so I put the hashed file into a server shared container and then put the server shared container into my parallel job and it seems like it will work. So far so good.

My question is while in parallel, accessing (reading only) the hashed file inside the server shared container, will my parallel job inadvertently run in sequential mode? What problems should I be aware of?

Thanks in advance
trokosz
Premium Member
Premium Member
Posts: 188
Joined: Thu Sep 16, 2004 6:38 pm
Contact:

Post by trokosz »

Server shared containers run in serial but if connected to other stages outside of the shared container will run in parallel as designed. Thats about it...
jmessiha
Participant
Posts: 21
Joined: Fri Nov 12, 2004 9:48 am

Post by jmessiha »

So you're saying that there will be no bottle-neck while performing a parallel lookup on a hashed file inside of a shared server container?
trokosz
Premium Member
Premium Member
Posts: 188
Joined: Thu Sep 16, 2004 6:38 pm
Contact:

Post by trokosz »

Your not do a parallel lookup in a shared container at all with a hash file...its serial....you take the performance hit you would have taken in a server job....or you gain nothing from the parallel job in this area....hash files are not in parallel jobs so you have no choice but a server job or shared container faking a server job...
jmessiha
Participant
Posts: 21
Joined: Fri Nov 12, 2004 9:48 am

Post by jmessiha »

Are you saying that each individual partition in the parallel stream will have to wait its turn to read the hashed file? They cannot read the same file at the same time?
trokosz
Premium Member
Premium Member
Posts: 188
Joined: Thu Sep 16, 2004 6:38 pm
Contact:

Post by trokosz »

Yes
jmessiha
Participant
Posts: 21
Joined: Fri Nov 12, 2004 9:48 am

Post by jmessiha »

I was thinking that because the parallel partitions were only reading the hashed file, it wouldn't be a problem (as opposed to writing to it). Are you sure that it will bottle-neck there? If so, what recommendations for alternative solutions might you have?
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Land it to a sequential file in server. In PX, pull it and use a regular lookup stage.
jmessiha
Participant
Posts: 21
Joined: Fri Nov 12, 2004 9:48 am

Post by jmessiha »

T42 wrote:Land it to a sequential file in server. In PX, pull it and use a regular lookup stage.
Won't that result in the same or poorer performance than the hashed file in a server container?
Eric
Participant
Posts: 254
Joined: Mon Sep 29, 2003 4:35 am

Post by Eric »

I think the standard lookup options in PX will read the lookup table once into memory and then perform the lookups (from each node) against the data in memory.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

jmessiha wrote:
T42 wrote:Land it to a sequential file in server. In PX, pull it and use a regular lookup stage.
Won't that result in the same or poorer performance than the hashed file in a server container?
It will out-perform the hash file within a server container, especially if you have a very decent disk I/O throughput, and assuming that your lookup data is not of a significant size (smaller than your physical memory to a certain degree.)

Remember, using a lookup stage can be done in parallel. Throw a lot of nodes at the job (enough for your server to hum nicely), ensure that the lookup stage is partitioned as auto (or entire), and you should notice a significant performance boost right after you finish loading the lookup data.
Post Reply