Custom Parallel Stagetype - Wrapped

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

Custom Parallel Stagetype - Wrapped

Post by rbk »

Hi everyone,
Hope you are all doing well ?
Request your expertise for the below scenario.

I am trying to create a custom Parallel Stagetype - Wrapped. Need your help with the below queries.
1. Is it possible to design such a stage and then make it work for each and every record from the source ?
2. Is it possible to make it work on a specific column alone and not on the whole record ? For example, say the source has 4 columns, of which one column alone needs to be encoded and then all the four columns need to be passed to the output for further processing ?
3. Is it possible to have a sample design/clear documentation for creating the stage ? I have been through the forum and the Advanced Developer's guide. I am not able to understand the examples listed such as sort -r... and the tr etc and the options on the Input/Output/properties etc. Any clearer documentation/ sample dsx would be much appreciated.


Scenario:
Oracle table has 4 columns including a BLOB column having PNG images in them. I am trying to extract them and get them loaded to a HIVE table or sequential file.
The downstream systems seem to be requesting the PNG files as Base64 encoded text since they are not able to read the BINARY data. Not sure where the trouble is i.e the load/read process. Hence I was thinking of creating a custom stage type which could maybe use the xxd command to encode it to base64 text. The stage should work for each row read from the source.

Code: Select all

Oracle	-> Custom Wrapped Stage	->	Target (File/Hive Table)
Kindly guide. Suggest if any other options can be tried.

Also when a BLOB column is being read as LongVarBinary, does it automatically convert the data to HEX format ?
Can an external filter stage be used for this requirement ? Will it act for each record from the source. If so, can I have a sample example of the command ?

Options considered:
1. Reading the BLOB column as LongVarBinary and writing to HIVE. The downstram java systems are unable to read the data and have reported as being corrupted.
2. Parallel Routines, Support team may be reluctant supporting C++ codes.
3. Unable to find any Basic Routines with most of them returning an error as unrecognised character encountered.

Thanks in advance for all the help. Would be happy to help with any additional information on the scenario if needed.
Cheers,
RBK
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

To my understanding a wrapper stage is one that passes data through to an external executable or script and reads the response from said program/script. As such your support team would still need to maintain that. Assuming I am correct in my assumption :wink: , you should be able to parse the data from datastage to an array and encode the column in question from that array. Then return the contents of the new array with the encoded value back to datastage (console output). I've no real life experience with this however.

Another thought, you could perform a similar process by using a series of jobs. One parallel job, reads your table and outputs the data as-is to a sequential file. A script is then designed to read that file and create a new sequential file with the encoded values. A second parallel job would then read that new sequential file and import that to your destination database. A sequence job could be used to handle job1 -> script -> job2 execution flow.
-Me
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

wrapped stage only accepts pipe safe commands and hence doesn't really do row by row. Inserting a wave gen to do row by row may not work because the Wrapped stage is typically Not 'Wave-Aware'.
Maybe try sequential file with multiple files option and use filter command .
Is there no oracle function you can use to encode before you read blob?
LongVarBinary should be fine.Idk how much is the max length allowed, but make it unbounded lvb.
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

Post by rbk »

jackson.eyton wrote:To my understanding a wrapper stage is one that passes data through to an external executable or script and reads the response from said program/script. As such your support team would still need to maintain that. Assuming I am correct in my assumption :wink: , you should be able to parse the data from datastage to an array and encode the column in question from that array. Then return the contents of the new array with the encoded value back to datastage (console output). I've no real life experience with this however.

Another thought, you could perform a similar process by using a series of jobs. One parallel job, reads your table and outputs the data as-is to a sequential file. A script is then designed to read that file and create a new sequential file with the encoded values. A second parallel job would then read that new sequential file and import that to your destination database. A sequence job could be used to handle job1 -> script -> job2 execution flow.
Thank you so much for your response. So basically I am out of luck with the wrapped stage is it ?

Anyways your idea of the job - script - job was interesting. But problem is how would I read the BLOB column from the Oracle source ? I would maybe try to read it as LongVarBinary and then write it to a sequential file, but would unix command work on such data ? Using any other datatype gives me the error message of

Code: Select all

Expected <datatype> got BLOB instead.
Also we do not have a direct command to convert the BLOB column to varchar in Oracle. The one we have considers only 32k bytes at a time only and we may need to loop it since we have images running a few hundred MBs in the BLOB column. Any other way of getting around this issue ? Kindly guide...Thank you once again for all your help.
Cheers,
RBK
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

Post by rbk »

Thank you for your response ramesh.

I have tried using the LongVarBinary option and I always seem to be getting the below error message at the target.

Code: Select all

ODBC Function "SQLBindParameter" reported SQLSTATE = HY090: Native Error Code = 0: Msg = [DataDirect][ODBC lib] Invalid string or buffer length (CC-OdbcDBStatement::dbsBindINputs, file CC_OdbcDBStatement.cpp, line 424)
I have tried with length, unbounded, Allow LOB references, everything seems to be giving the same error.

Also two doubts,
1. The error message seems to be showing those ODBC functions and .cpp files with line numbers right ? any way we can see those files ? or atleast see which column is causing this issue ?
2. Regarding the filter option in the sequential file stage. At source I know that the filter command is executed before the data gets output to the job. At the target how does it work ? Does it apply the command and then write to the target file ?

TIA for all your help.
Cheers,
RBK
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

I guess you'd be better off using an Oracle Stage like oracle connector than ODBC for such requirements. Its most likely your BLOB column has a limited support in ODBC stage that causing the error . The CPP program will usually be packaged in a shared library and you wont really have access to the source code as its proprietary, moreover with shared library the symbol table will be stripped so you wont have access to original code objects.
To Debug , create an User defined environment variable CC_MSG_LEVEL and set to 5 and run the job , and work with your official support provider with a PMR.
Also check your ODBC driver settings , maybe teher is a flag you can put in .odbc.ini to support BLOBS better, maybe with Oracle wire protocol driver etc .

FYI this is the shared library where you see the method that's giving an error ( this is the name on AIX only, and is a 64 bit XCOFF shared lib object ) , I dont know your Machine platform.

/opt/IBM/InformationServer/Server/DSComponents/bin/libccodbc.so

From output of
nm -X64 libccodbc.so

Code: Select all

.CC_OdbcDBStatement::CC_OdbcDBStatement(int,const unsigned short*,CC_DBStmtParamList*,CC_DBStmtParamList*,CC_OdbcConnection*,int,CC_OdbcAdapter*) T  4294970272
[b].CC_OdbcDBStatement::dbsBindInputs() T  4294990368[/b]
.CC_OdbcDBStatement::dbsBindOutputs() T  4294992128
.CC_OdbcDBStatement::dbsDescribeResultSet() T  4295040000
.CC_OdbcDBStatement::dbsExecute(int) T  4294995968
.CC_OdbcDBStatement::dbsFetch() T  4295039456
.CC_OdbcDBStatement::dbsPrepare() T  4294975808
.CC_OdbcDBStatement::executeDelete(int) T  4295017472
.CC_OdbcDBStatement::executeInsert(int) T  4294999200
.CC_OdbcDBStatement::executeSelect(int) T  4295022624
.CC_OdbcDBStatement::executeUpdate(int) T  4295007808
.CC_OdbcDBStatement::executeUserSQL(int) T  4295023520
.CC_OdbcDBStatement::getConsumer() T  4295045024
.CC_OdbcDBStatement::getNumResultCols() T  4295041536
.CC_OdbcDBStatement::getRowsProcessed() T  4295045056
.CC_OdbcDBStatement::getStmtHandle() T  4295044960
.CC_OdbcDBStatement::hasAnotherStatement(int) T  4295024832
.CC_OdbcDBStatement::hasRejectLink(int) T  4295029536
.CC_OdbcDBStatement::hasRejectProducer() T  4295043104
.CC_OdbcDBStatement::identifyFirstFailingRow(int) T  4295031744
.CC_OdbcDBStatement::isSetSkipPrepare() T  4294990336
.CC_OdbcDBStatement::moveDataRows(CC_DBStmtParamIterator,int,int) T  4295037664
.CC_OdbcDBStatement::moveErrorBuffersToBindBuffers(int) T  4295033984
.CC_OdbcDBStatement::movePendingBuffersToBindBuffers(int) T  4295038464
.CC_OdbcDBStatement::moveRowToErrorBuffers(int) T  4295043168
.CC_OdbcDBStatement::moveRowToPendingBuffers(int,int) T  4295036864
.CC_OdbcDBStatement::printBindBuffer(int,CC_DBStmtParamList*) T  4295025344
.CC_OdbcDBStatement::processRowStatusArray(int) T  4295032000
.CC_OdbcDBStatement::processRowStatusArrayUserSQL(int) T  4295045088
.CC_OdbcDBStatement::rebindLobParameters() T  4294993920
.CC_OdbcDBStatement::rebindToRowIndex(int) T  4295044032
.CC_OdbcDBStatement::rejectRowAt(int) T  4295036000
.CC_OdbcDBStatement::rejectRows(int) T  4295030112
.CC_OdbcDBStatement::restoreBindBuffers() T  4295033184
.CC_OdbcDBStatement::setArraySize(int) T  4295043040
.CC_OdbcDBStatement::setConsumer(CC_OdbcDBRecordDataSetConsumer*) T  4295044992
.CC_OdbcDBStatement::setExecutionMode(int) T  4295043072
.CC_OdbcDBStatement::~CC_OdbcDBStatement() T  4294973792
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

I really dont have any experience with BLOB data but have you tried using Run time Column Propagation (RCP) from the source to a sequential file? See what the BLOB data looks like then?
-Me
jackson.eyton
Premium Member
Premium Member
Posts: 145
Joined: Thu Oct 26, 2017 10:43 am

Post by jackson.eyton »

Any luck on this?
-Me
rbk
Participant
Posts: 23
Joined: Wed Oct 23, 2013 1:10 am
Location: India

Post by rbk »

Hi all,
Thank you for following up. We have raised a PMR with IBM to help us with the issue. Meanwhile we have been suggested to try using the DRS connector with LongVarBinary datatype for the BLOB column using the DataDirect drivers. Let us see what other solutions, IBM PMR team comes up with.

Thank you for all the help. Please do let me know for any other things we can try to resolve this issue.
Cheers,
RBK
Post Reply