Input not sorted at record

jweir · Post by **jweir** » Thu Jan 16, 2014 10:25 pm

Hi all,

I recently ran into a problem when trying to sort within Netezza and preserving the sort. This same job has ran many times without fail and now when I am testing with a billion rows, it fails.

My job is set up to sort y data in my Netezza query using a Netezza Connector. After using the ORDER BY in the source query, I use a sort stage right after the Netezza Connector to preserve sort (I found DataStage didnt preserve sort if I did not add in the sort stage). In the sort stage, I use Same partitioning and I also specify the same exact columns I used in my ORDER BY (I have checked this several times now and they are in the same order as in my query). I set the sort method to Don't Sort Previously Sorted. Like I mentioned, this has worked and has been tested previously, even with 500 million records. But now with a billion row test, it fails. And it fails at the same record every time:

Code: Select all

srt_KeepOrder,0: Input not sorted at record 15046990

I am running on a 4 node configuration.

Does anyone have any thoughts as to why this is happening now?

Thanks in advance.

chulett · Post by **chulett** » Fri Jan 17, 2014 7:45 am

Have you checked the actual values around that record to see the nature of the issue? Perhaps Netezza is performing a string sort and DataStage is expecting a numeric one. You really need to see the sequence of sorted values to figure out what's going on, even if it is bajillions of records.

jweir · Post by **jweir** » Fri Jan 17, 2014 8:14 am

Thanks Craig.

That is actually what we are doing now. We are going to output the results of the Netezza query to a file and the results after the sort is being done in DataStage to compare.

I have since changed the job to sort within DataStage and it was successful. However I am still going to try and figure out as to why this happened since sorting in Netezza is much faster. It is a possibility that I may not be able to sort within Netezza given the data.

I will post the results of my findings later.

chulett · Post by **chulett** » Fri Jan 17, 2014 9:50 am

FWIW, I've been in your shoes. Sort in the source, assert that it is already sorted in the job and everything tests out fine, volume testing in UAT/QAT works well too. It spends a short time in Production and then blam! falls over dead... something comes in that we'd never seen before in a sort field and it doesn't sort the way DataStage thinks it should. [sigh]

In our case I seem to recall having to cast the source field from whatever it was to something else (string->number or vice versa) and then handling the data type as the new type in job to make DS happy. Be curious what yours turns out to be and what your options are to solve it.

jweir · Post by **jweir** » Fri Jan 17, 2014 11:13 am

Interesting. Glad I am not alone here then

We are sorting by many different fields, including address and name fields (trust me, I did not create the requirements) so my thinking is some odd character came into the source that throws off the sort.

Stay tuned.

jweir · Post by **jweir** » Fri Jan 17, 2014 2:34 pm

May have found the culprit. Seems there is an unprintable character in one of the sort columns that throws off DataStage when we pull the records out of Netezza. When outputted to a file and do a 'vi', we can see this at the end of the column in question:

Code: Select all

^/

And when I output it to CSV format and open in Excel, I can see there is a space at the end of the field but obviously it is not a space.

I tried to go a regular expression replace on this, but still no luck.

Any thoughts on how to rid of it?

chulett · Post by **chulett** » Fri Jan 17, 2014 2:44 pm

You need to check it in a hex editor or do an 'octal dump' on the UNIX side and determine what the actual hexadecimal code(s) is.

jweir · Post by **jweir** » Fri Jan 17, 2014 2:46 pm

Looks like the HEXA value is 28?

http://sourceforge.net/p/notepad-plus/d ... /ca23a0ef/

What needs to be done when this is found?

jweir · Post by **jweir** » Fri Jan 17, 2014 2:56 pm

I've tried to regex replaces:

Code: Select all

trim(system..replace(TempAcct.NameAddress1,'\x1C',''))

trim(system..replace(TempAcct.NameAddress1,'\x28',''))

But still no luck when trying to use the Netezza sort...

chulett · Post by **chulett** » Fri Jan 17, 2014 10:28 pm

You're trying to do that in the source? See if the chr() function helps, give it the decimal value of the problem character. Use it rather than your hex notation in the trim.

jweir · Post by **jweir** » Wed Jan 22, 2014 8:24 am

It seems the File Separator was the ultimate culprit in creating this error. I have switched to sorting on the DataStage server instead of Netezza to avoid this error. I could not convert the unprintable character in the source, however I think it is safer to sort in DataStage rather than trying to convert an unprintable character every time it arrives.

I will mark this as a Workaround.