Input not sorted at record

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jweir
Participant
Posts: 134
Joined: Wed Aug 04, 2010 12:31 pm

Input not sorted at record

Post by jweir »

Hi all,

I recently ran into a problem when trying to sort within Netezza and preserving the sort. This same job has ran many times without fail and now when I am testing with a billion rows, it fails.

My job is set up to sort y data in my Netezza query using a Netezza Connector. After using the ORDER BY in the source query, I use a sort stage right after the Netezza Connector to preserve sort (I found DataStage didnt preserve sort if I did not add in the sort stage). In the sort stage, I use Same partitioning and I also specify the same exact columns I used in my ORDER BY (I have checked this several times now and they are in the same order as in my query). I set the sort method to Don't Sort Previously Sorted. Like I mentioned, this has worked and has been tested previously, even with 500 million records. But now with a billion row test, it fails. And it fails at the same record every time:

Code: Select all

srt_KeepOrder,0: Input not sorted at record 15046990
I am running on a 4 node configuration.

Does anyone have any thoughts as to why this is happening now?

Thanks in advance.
Jweir

--- If strength were all, tiger would not fear scorpion.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you checked the actual values around that record to see the nature of the issue? Perhaps Netezza is performing a string sort and DataStage is expecting a numeric one. You really need to see the sequence of sorted values to figure out what's going on, even if it is bajillions of records.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jweir
Participant
Posts: 134
Joined: Wed Aug 04, 2010 12:31 pm

Post by jweir »

Thanks Craig.

That is actually what we are doing now. We are going to output the results of the Netezza query to a file and the results after the sort is being done in DataStage to compare.

I have since changed the job to sort within DataStage and it was successful. However I am still going to try and figure out as to why this happened since sorting in Netezza is much faster. It is a possibility that I may not be able to sort within Netezza given the data.

I will post the results of my findings later.
Jweir

--- If strength were all, tiger would not fear scorpion.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

FWIW, I've been in your shoes. Sort in the source, assert that it is already sorted in the job and everything tests out fine, volume testing in UAT/QAT works well too. It spends a short time in Production and then blam! falls over dead... something comes in that we'd never seen before in a sort field and it doesn't sort the way DataStage thinks it should. [sigh]

In our case I seem to recall having to cast the source field from whatever it was to something else (string->number or vice versa) and then handling the data type as the new type in job to make DS happy. Be curious what yours turns out to be and what your options are to solve it. :)
-craig

"You can never have too many knives" -- Logan Nine Fingers
jweir
Participant
Posts: 134
Joined: Wed Aug 04, 2010 12:31 pm

Post by jweir »

Interesting. Glad I am not alone here then 8)

We are sorting by many different fields, including address and name fields (trust me, I did not create the requirements) so my thinking is some odd character came into the source that throws off the sort.

Stay tuned.
Jweir

--- If strength were all, tiger would not fear scorpion.
jweir
Participant
Posts: 134
Joined: Wed Aug 04, 2010 12:31 pm

Post by jweir »

May have found the culprit. Seems there is an unprintable character in one of the sort columns that throws off DataStage when we pull the records out of Netezza. When outputted to a file and do a 'vi', we can see this at the end of the column in question:

Code: Select all

^/
And when I output it to CSV format and open in Excel, I can see there is a space at the end of the field but obviously it is not a space.

I tried to go a regular expression replace on this, but still no luck.

Any thoughts on how to rid of it?
Jweir

--- If strength were all, tiger would not fear scorpion.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You need to check it in a hex editor or do an 'octal dump' on the UNIX side and determine what the actual hexadecimal code(s) is.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jweir
Participant
Posts: 134
Joined: Wed Aug 04, 2010 12:31 pm

Post by jweir »

Looks like the HEXA value is 28?

http://sourceforge.net/p/notepad-plus/d ... /ca23a0ef/

What needs to be done when this is found?
Jweir

--- If strength were all, tiger would not fear scorpion.
jweir
Participant
Posts: 134
Joined: Wed Aug 04, 2010 12:31 pm

Post by jweir »

I've tried to regex replaces:

Code: Select all

trim(system..replace(TempAcct.NameAddress1,'\x1C',''))

trim(system..replace(TempAcct.NameAddress1,'\x28',''))
But still no luck when trying to use the Netezza sort...
Jweir

--- If strength were all, tiger would not fear scorpion.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You're trying to do that in the source? See if the chr() function helps, give it the decimal value of the problem character. Use it rather than your hex notation in the trim.
-craig

"You can never have too many knives" -- Logan Nine Fingers
jweir
Participant
Posts: 134
Joined: Wed Aug 04, 2010 12:31 pm

Post by jweir »

It seems the File Separator was the ultimate culprit in creating this error. I have switched to sorting on the DataStage server instead of Netezza to avoid this error. I could not convert the unprintable character in the source, however I think it is safer to sort in DataStage rather than trying to convert an unprintable character every time it arrives.

I will mark this as a Workaround.
Jweir

--- If strength were all, tiger would not fear scorpion.
Post Reply