Stupid Beginners Ques re lookup stage - HELP I am to dense

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
fridge
Premium Member
Premium Member
Posts: 136
Joined: Sat Jan 10, 2004 8:51 am

Stupid Beginners Ques re lookup stage - HELP I am to dense

Post by fridge »

Hi, I'm am trying to familiarise myself with PX and have the following problem.



2 files both files 50 records (to keep it easy)
CustData (keyed by custid(int) 1-50)
CustLoookup (keyed by custid(int) 1-50)

PX job 1 load CustLookup into lookup dataset modulas partioned
on custid

PX job 2 read CUstDate -->> tfm with partion mod on cust id
lookup stage ref custlookup - reject any not found records
- out put found records

results
PX job 2 outputs 12 records and rejects 38
the 12 records are all in partion 0 (custId = 4,8,12 etc)
the 38 records are in partions 1-3 (custid = 1,2,3,5,6,7 etc)

I have a .apt config with 4 nodes

Question is ???????????????????????/
I have worked out after a bit of sleuth work , i.e. using the data above that I am not partioning correctly, but since both the lookup data set and the lookup stage are partioned by mod(custid) - I have checked this with Peek stage how come my lookup stage only seems to find data on Part0


I KNOW I am missing something but any tips / gotchas / pithy comments and abuse welcome, I have speant an evening on this when I should have been down the pub so would like to get it sorted

Thanks in advance

Fridge
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Re: Stupid Beginners Ques re lookup stage - HELP I am to den

Post by T42 »

fridge wrote:PX job 1 load CustLookup into lookup dataset modulas partioned on custid

PX job 2 read CUstDate -->> tfm with partion mod on cust id
Please do not use manual partitioning. Lookup stage is designed to handle partitioning on its own (in fact, by default they use "Entire" partitioning). If you order specific partitioning, that override the lookup's default behavior.

Manual partitioning are only to be used when the documents/help files specifically do not say that it do it. From the help files:

"There are some special partitioning considerations for lookup stages. You need to ensure that the data being looked up in the lookup table is in the same partition as the input data referencing it. One way of doing this is to partition the lookup tables using the Entire method. Another way is to partition it in the same way as the input data (although this implies sorting of the data)."

The Entire method is done by default. Do an $APT_DUMP_SCORE if you want to observe the behavior of the nodes.

This response is assuming you are using 7.x. Lookup behavior is different for 6.x.
fridge
Premium Member
Premium Member
Posts: 136
Joined: Sat Jan 10, 2004 8:51 am

Post by fridge »

Told you I was too dense for this job, unfortunatley I had been doing this by signing on from home, which is limited as I didnt have the manuals.

Worked a treat, thanks for that
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Just make sure it is standard practice for everything -- do not partition/sort data unless you have to. And when you think you have to do it, run a nice small test with a bunch of randomized data to confirm that this is indeed required.

Ascential tries to minimize the need to do manual control over this aspect, especially starting at 7.x.

I have seen developers sort and partition data going to datasets. Thus other developers working on jobs down the streams would get very unusual results, spending days trying to fix it before turning to me for help. Once I traced the error up the flow to this, and removed the sort/partitioning, the problems suddenly disappears.
Post Reply