Sample Stage Output Not As Expected

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
wfis
Premium Member
Premium Member
Posts: 70
Joined: Wed Feb 28, 2007 2:38 am
Location: India

Sample Stage Output Not As Expected

Post by wfis »

All,

I have a job in which I am sampleing records using a Sample stage.

I am using Percent mode to sample the data. Below are the properties I have set:

Percent = 5.0
Sample Mode = Percent

Input to Sample Stage: 261444 Rows
Output Of Sample Stage: 12895 Rows

Expected Output : (261444*5/100) = 13072.2

The input data is Hash partitioned and Sorted on a key column which has all unique values.

I am using DS Version 8.0.

Can someone help me understand why is there difference in the count of sample stage? Am I missing any other property to be set, like Seed and all?

Please suggest. Thanks In Advance!!!!
wfis
Premium Member
Premium Member
Posts: 70
Joined: Wed Feb 28, 2007 2:38 am
Location: India

Post by wfis »

Well, I added one more property in the sample stage and output seems to be going closer now:

I added property: Seed = 1 and now the Output Row Count is 13077 which is much closer to the expected count.

What is the significance of this property and how does that affect the output count?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The seed is for the pseudo random number generator, with the same seed value the output of the sample will always be identical. If you start with a 1-node configuration is the output sample what you expect?
wfis
Premium Member
Premium Member
Posts: 70
Joined: Wed Feb 28, 2007 2:38 am
Location: India

Post by wfis »

I have not tried it. But what I can say is same configuration on DS7.5 gives different output and in DS8.0 it gives different output count. I am not sure why??
Post Reply