I recently ran into a problem when trying to sort within Netezza and preserving the sort. This same job has ran many times without fail and now when I am testing with a billion rows, it fails.
My job is set up to sort y data in my Netezza query using a Netezza Connector. After using the ORDER BY in the source query, I use a sort stage right after the Netezza Connector to preserve sort (I found DataStage didnt preserve sort if I did not add in the sort stage). In the sort stage, I use Same partitioning and I also specify the same exact columns I used in my ORDER BY (I have checked this several times now and they are in the same order as in my query). I set the sort method to Don't Sort Previously Sorted. Like I mentioned, this has worked and has been tested previously, even with 500 million records. But now with a billion row test, it fails. And it fails at the same record every time:
Have you checked the actual values around that record to see the nature of the issue? Perhaps Netezza is performing a string sort and DataStage is expecting a numeric one. You really need to see the sequence of sorted values to figure out what's going on, even if it is bajillions of records.
-craig
"You can never have too many knives" -- Logan Nine Fingers
That is actually what we are doing now. We are going to output the results of the Netezza query to a file and the results after the sort is being done in DataStage to compare.
I have since changed the job to sort within DataStage and it was successful. However I am still going to try and figure out as to why this happened since sorting in Netezza is much faster. It is a possibility that I may not be able to sort within Netezza given the data.
I will post the results of my findings later.
Jweir
--- If strength were all, tiger would not fear scorpion.
FWIW, I've been in your shoes. Sort in the source, assert that it is already sorted in the job and everything tests out fine, volume testing in UAT/QAT works well too. It spends a short time in Production and then blam! falls over dead... something comes in that we'd never seen before in a sort field and it doesn't sort the way DataStage thinks it should. [sigh]
In our case I seem to recall having to cast the source field from whatever it was to something else (string->number or vice versa) and then handling the data type as the new type in job to make DS happy. Be curious what yours turns out to be and what your options are to solve it.
-craig
"You can never have too many knives" -- Logan Nine Fingers
We are sorting by many different fields, including address and name fields (trust me, I did not create the requirements) so my thinking is some odd character came into the source that throws off the sort.
Stay tuned.
Jweir
--- If strength were all, tiger would not fear scorpion.
May have found the culprit. Seems there is an unprintable character in one of the sort columns that throws off DataStage when we pull the records out of Netezza. When outputted to a file and do a 'vi', we can see this at the end of the column in question:
You're trying to do that in the source? See if the chr() function helps, give it the decimal value of the problem character. Use it rather than your hex notation in the trim.
-craig
"You can never have too many knives" -- Logan Nine Fingers
It seems the File Separator was the ultimate culprit in creating this error. I have switched to sorting on the DataStage server instead of Netezza to avoid this error. I could not convert the unprintable character in the source, however I think it is safer to sort in DataStage rather than trying to convert an unprintable character every time it arrives.
I will mark this as a Workaround.
Jweir
--- If strength were all, tiger would not fear scorpion.