Rnd function returning the same value

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Rnd function returning the same value

Post by taylor.hermann »

Hello,

I have finally been stumped and need some advice.

I have the exact same issue as this post: viewtopic.php?t=128945

Basically I created a server routine to for some random values using the rnd() function. Now this functions works as intended when I test the routine itself. But when I place it into a transformer in a server job, I get the same values every run. And I am not specifying a seed value, specifically because it was my understanding you did not need to. And I thought maybe since I didn't specify one, that datastage is assuming one or something.... But doesn't make sense to me why I can test the routine and get random values.
The IBM knowledge article specifically says to use the rnd() function if you want to generate an unrepeatable random number sequence. Which I do!!!!

But every time I run the server job, the value 8523 becomes 6007.

And did not see a proper solution in the previous post. I understand that we can "seed" the rnd() function. But that is not the solution we need. Because it is my understanding, that if we seed the rnd() function, the values are going too all be the same in that run essentially.
And when I did implement a seed, it works that each run produces different values, but all the values in the single run are converted the same.
Like so:

Code: Select all

8 -> 7
85 -> 70
856 -> 701
8569 -> 7015
And as a side note, I'm using these random values as a workaround to mask data. So if someone could locate their input record, (which would be easy) they could figure out the "randomized" values, and figure out every record then afterwards. And I do not need to be able convert these values back to their original values.

Also this does not HAVE to be a server job. I just had assumed it would be easiest to implement the code into a routine. But sadly although the code works when testing the routine, it doesn't work as intended at run-time. And it could also be reusable if in the future we require other data masking requirements.

My next option is to try and implement this code into a looping variable or something in a parallel job. But would like to at least figure out the "why" behind this issue in the server job.

Any feedback or advice is greatly appreciated.

Thanks,
Taylor
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Looks like you are using the same seed every run. Check out the RANDOMIZE statement. See based on, for example

Code: Select all

Date() * Time()
The default seed should be based on the system date and time; perhaps this is not working in your case.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Tell us more about what you are doing in the Job. I just built a quick Server Job in 11.3, passed a constant (using the same 8xxx) value into a Transformer for 200 rows, with Rnd(inLink.col1) and I received 200 different values that looked (albeit unscientifically) nicely distributed across the 8xxx possibilities.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

That's what I thought too ray. Seemed as even though I didn't specify a seed, it was defaulting one for me. However it doesn't seem to happen when I run the test on the routine by itself. Which is the part that drives me crazy. Because it works as I would like it when I test the routine, but not when I put it into the transformer in the job.

And the problem that I saw with setting the seed value, was that it seemed that the random values were all the same like I mentioned in the last post. Although it would be different for each run. I do not want all the "random" values in a single run to be the same number.

Not sure if this is the most efficient way to do this, and I'm new to writing routines to begin with. But we have requirements to "mask" data to the exact same length as the input data. So for each position in the input string, I randomize a new value for it. But here is what my routine is basically doing:
(I've chopped some other portions out as it is all behaving the same essentially)

Code: Select all

      
StringLength = Len(Arg1)
if upcase(Arg2) = "N"
         RandomizedNumbers = Arg1

         FOR loopNumber = 0 TO StringLength
                RandomizedNumbers[loopNumber,1] = RND(10)
         Next loopNumber

         Ans = RandomizedNumbers
So this works when testing the routine as I want. But when I run the job over and over, for each record, the values are always "randomized" exactly the same. They never change in between runs. Which is the part that is driving me insane. (number gets converted to a random value, but doesn't change between runs)

Code: Select all

Run 1
record 1: 85 -> 75
record 2: 85 -> 22 
record 3: 85 -> 96

Run 2..... etc...
record 1: 85 -> 75
record 2: 85 -> 22 
record 3: 85 -> 96
Now I have already tried adding a seed like "randomize Date()*Time()". When I run test the routine and when I implement the function in the job. They both behave in a manner I don't want. (numbers all get converted to the "same" set of numbers. But changes between runs)

Code: Select all

Run 1
85 -> 17
850 -> 178
1111 -> 1785
55555 -> 17854

Run 2..... etc...
45 -> 86
2222-> 8665
7777-> 8665
55555 -> 86651 
Now the job itself is pretty simple. I'm literally just selecting a double from a database, passing that value, and another argument to the function. Then writing to a Seq file. So I feel like the issue is within the function. Or there is something I am missing....
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

Found a workaround. My solution was just passing in another argument as the seed....
Date()*Time() + @INROWNUM. As I want each row to be seeded differently.


But still would love to know why potentially when you test the routine, it would work without a seed. But in a job it wouldn't. Does testing the routine automatically apply a seed? or does datastage assume a seed if none is given? And why does the RND() fuction "require" a seed?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Questions for your official support provider, I would think.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

In general to get random numbers that are in a different sequence from run to run, you have to find a way to randomize, or specify a seed, based on something that is always changing and unpredictable. I have had better luck by using Parallel jobs that bring the current time into play, with milliseconds or preferably microseconds being the part that is always rapidly changing. You can substring the final digit or do a Mod() on it. You still have to test and ensure it doesn't generate the same results every time.
Choose a job you love, and you will never have to work a day in your life. - Confucius
taylor.hermann
Premium Member
Premium Member
Posts: 32
Joined: Wed Aug 20, 2014 11:17 am

Post by taylor.hermann »

Yeah I guess this is just a novice mistake. I guess I didn't realize you were "required" to have a seed. Didn't see any documentation mentioning needing a seed or anything. And when I was manually testing the routine it seemed to work as I had intended without one. Which added to my confusion further.

But thanks for the clarity everyone.
Post Reply