DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
rbk
Participant



Joined: 23 Oct 2013
Posts: 22
Location: India
Points: 407

Post Posted: Tue Jun 05, 2018 12:57 am Reply with quote    Back to top    

DataStage® Release: 11x
Job Type: Parallel
OS: Unix
Hi everyone,
Hope you are all doing well ?
Request your expertise for the below scenario.

I am trying to create a custom Parallel Stagetype - Wrapped. Need your help with the below queries.
1. Is it possible to design such a stage and then make it work for each and every record from the source ?
2. Is it possible to make it work on a specific column alone and not on the whole record ? For example, say the source has 4 columns, of which one column alone needs to be encoded and then all the four columns need to be passed to the output for further processing ?
3. Is it possible to have a sample design/clear documentation for creating the stage ? I have been through the forum and the Advanced Developer's guide. I am not able to understand the examples listed such as sort -r... and the tr etc and the options on the Input/Output/properties etc. Any clearer documentation/ sample dsx would be much appreciated.


Scenario:
Oracle table has 4 columns including a BLOB column having PNG images in them. I am trying to extract them and get them loaded to a HIVE table or sequential file.
The downstream systems seem to be requesting the PNG files as Base64 encoded text since they are not able to read the BINARY data. Not sure where the trouble is i.e the load/read process. Hence I was thinking of creating a custom stage type which could maybe use the xxd command to encode it to base64 text. The stage should work for each row read from the source.

Code:
Oracle   -> Custom Wrapped Stage   ->   Target (File/Hive Table)


Kindly guide. Suggest if any other options can be tried.

Also when a BLOB column is being read as LongVarBinary, does it automatically convert the data to HEX format ?
Can an external filter stage be used for this requirement ? Will it act for each record from the source. If so, can I have a sample example of the command ?

Options considered:
1. Reading the BLOB column as LongVarBinary and writing to HIVE. The downstram java systems are unable to read the data and have reported as being corrupted.
2. Parallel Routines, Support team may be reluctant supporting C++ codes.
3. Unable to find any Basic Routines with most of them returning an error as unrecognised character encountered.

Thanks in advance for all the help. Would be happy to help with any additional information on the scenario if needed.

_________________
Cheers,
RBK
jackson.eyton



Group memberships:
Premium Members

Joined: 26 Oct 2017
Posts: 106

Points: 1791

Post Posted: Tue Jun 05, 2018 10:07 am Reply with quote    Back to top    

To my understanding a wrapper stage is one that passes data through to an external executable or script and reads the response from said program/script. As such your support team would still need to maintain that. Assuming I am correct in my assumption Wink , you should be able to parse the data from datastage to an array and encode the column in question from that array. Then return the contents of the new array with the encoded value back to datastage (console output). I've no real life experience with this however.

Another thought, you could perform a similar process by using a series of jobs. One parallel job, reads your table and outputs the data as-is to a sequential file. A script is then designed to read that file and create a new sequential file with the encoded values. A second parallel job would then read that new sequential file and import that to your destination database. A sequence job could be used to handle job1 -> script -> job2 execution flow.

_________________
-Me
Rate this response:  
Not yet rated
rameshrr3



Group memberships:
Premium Members

Joined: 10 May 2004
Posts: 609
Location: BRENTWOOD, TN
Points: 6937

Post Posted: Tue Jun 05, 2018 3:05 pm Reply with quote    Back to top    

wrapped stage only accepts pipe safe commands and hence doesn't really do row by row. Inserting a wave gen to do row by row may not work because the Wrapped stage is typically Not 'Wave-Aware'.
Maybe try sequential file with multiple files option and use filter command .
Is there no oracle function you can use to encode before you read blob?
LongVarBinary should be fine.Idk how much is the max length allowed, but make it unbounded lvb.
Rate this response:  
Not yet rated
rbk
Participant



Joined: 23 Oct 2013
Posts: 22
Location: India
Points: 407

Post Posted: Fri Jun 08, 2018 12:49 am Reply with quote    Back to top    

jackson.eyton wrote:
To my understanding a wrapper stage is one that passes data through to an external executable or script and reads the response from said program/script. As such your support team would still need to maintain that. Assuming I am correct in my assumption Wink , you should be able to parse the data from datastage to an array and encode the column in question from that array. Then return the contents of the new array with the encoded value back to datastage (console output). I've no real life experience with this however.

Another thought, you could perform a similar process by using a series of jobs. One parallel job, reads your table and outputs the data as-is to a sequential file. A script is then designed to read that file and create a new sequential file with the encoded values. A second parallel job would then read that new sequential file and import that to your destination database. A sequence job could be used to handle job1 -> script -> job2 execution flow.


Thank you so much for your response. So basically I am out of luck with the wrapped stage is it ?

Anyways your idea of the job - script - job was interesting. But problem is how would I read the BLOB column from the Oracle source ? I would maybe try to read it as LongVarBinary and then write it to a sequential file, but would unix command work on such data ? Using any other datatype gives me the error message of
Code:
Expected <datatype> got BLOB instead.


Also we do not have a direct command to convert the BLOB column to varchar in Oracle. The one we have considers only 32k bytes at a time only and we may need to loop it since we have images running a few hundred MBs in the BLOB column. Any other way of getting around this issue ? Kindly guide...Thank you once again for all your help.

_________________
Cheers,
RBK
Rate this response:  
Not yet rated
rbk
Participant



Joined: 23 Oct 2013
Posts: 22
Location: India
Points: 407

Post Posted: Fri Jun 08, 2018 12:59 am Reply with quote    Back to top    

Thank you for your response ramesh.

I have tried using the LongVarBinary option and I always seem to be getting the below error message at the target.
Code:
ODBC Function "SQLBindParameter" reported SQLSTATE = HY090: Native Error Code = 0: Msg = [DataDirect][ODBC lib] Invalid string or buffer length (CC-OdbcDBStatement::dbsBindINputs, file CC_OdbcDBStatement.cpp, line 424)

I have tried with length, unbounded, Allow LOB references, everything seems to be giving the same error.

Also two doubts,
1. The error message seems to be showing those ODBC functions and .cpp files with line numbers right ? any way we can see those files ? or atleast see which column is causing this issue ?
2. Regarding the filter option in the sequential file stage. At source I know that the filter command is executed before the data gets output to the job. At the target how does it work ? Does it apply the command and then write to the target file ?

TIA for all your help.

_________________
Cheers,
RBK
Rate this response:  
Not yet rated
rameshrr3



Group memberships:
Premium Members

Joined: 10 May 2004
Posts: 609
Location: BRENTWOOD, TN
Points: 6937

Post Posted: Fri Jun 08, 2018 5:21 pm Reply with quote    Back to top    

I guess you'd be better off using an Oracle Stage like oracle connector than ODBC for such requirements. Its most likely your BLOB column has a limited support in ODBC stage that causing the error . The CPP program will usually be packaged in a shared library and you wont really have access to the source code as its proprietary, moreover with shared library the symbol table will be stripped so you wont have access to original code objects.
To Debug , create an User defined environment variable CC_MSG_LEVEL and set to 5 and run the job , and work with your official support provider with a PMR.
Also check your ODBC driver settings , maybe teher is a flag you can put in .odbc.ini to support BLOBS better, maybe with Oracle wire protocol driver etc .

FYI this is the shared library where you see the method that's giving an error ( this is the name on AIX only, and is a 64 bit XCOFF shared lib object ) , I dont know your Machine platform.

/opt/IBM/InformationServer/Server/DSComponents/bin/libccodbc.so

From output of
nm -X64 libccodbc.so


Code:
.CC_OdbcDBStatement::CC_OdbcDBStatement(int,const unsigned short*,CC_DBStmtParamList*,CC_DBStmtParamList*,CC_OdbcConnection*,int,CC_OdbcAdapter*) T  4294970272
[b].CC_OdbcDBStatement::dbsBindInputs() T  4294990368[/b]
.CC_OdbcDBStatement::dbsBindOutputs() T  4294992128
.CC_OdbcDBStatement::dbsDescribeResultSet() T  4295040000
.CC_OdbcDBStatement::dbsExecute(int) T  4294995968
.CC_OdbcDBStatement::dbsFetch() T  4295039456
.CC_OdbcDBStatement::dbsPrepare() T  4294975808
.CC_OdbcDBStatement::executeDelete(int) T  4295017472
.CC_OdbcDBStatement::executeInsert(int) T  4294999200
.CC_OdbcDBStatement::executeSelect(int) T  4295022624
.CC_OdbcDBStatement::executeUpdate(int) T  4295007808
.CC_OdbcDBStatement::executeUserSQL(int) T  4295023520
.CC_OdbcDBStatement::getConsumer() T  4295045024
.CC_OdbcDBStatement::getNumResultCols() T  4295041536
.CC_OdbcDBStatement::getRowsProcessed() T  4295045056
.CC_OdbcDBStatement::getStmtHandle() T  4295044960
.CC_OdbcDBStatement::hasAnotherStatement(int) T  4295024832
.CC_OdbcDBStatement::hasRejectLink(int) T  4295029536
.CC_OdbcDBStatement::hasRejectProducer() T  4295043104
.CC_OdbcDBStatement::identifyFirstFailingRow(int) T  4295031744
.CC_OdbcDBStatement::isSetSkipPrepare() T  4294990336
.CC_OdbcDBStatement::moveDataRows(CC_DBStmtParamIterator,int,int) T  4295037664
.CC_OdbcDBStatement::moveErrorBuffersToBindBuffers(int) T  4295033984
.CC_OdbcDBStatement::movePendingBuffersToBindBuffers(int) T  4295038464
.CC_OdbcDBStatement::moveRowToErrorBuffers(int) T  4295043168
.CC_OdbcDBStatement::moveRowToPendingBuffers(int,int) T  4295036864
.CC_OdbcDBStatement::printBindBuffer(int,CC_DBStmtParamList*) T  4295025344
.CC_OdbcDBStatement::processRowStatusArray(int) T  4295032000
.CC_OdbcDBStatement::processRowStatusArrayUserSQL(int) T  4295045088
.CC_OdbcDBStatement::rebindLobParameters() T  4294993920
.CC_OdbcDBStatement::rebindToRowIndex(int) T  4295044032
.CC_OdbcDBStatement::rejectRowAt(int) T  4295036000
.CC_OdbcDBStatement::rejectRows(int) T  4295030112
.CC_OdbcDBStatement::restoreBindBuffers() T  4295033184
.CC_OdbcDBStatement::setArraySize(int) T  4295043040
.CC_OdbcDBStatement::setConsumer(CC_OdbcDBRecordDataSetConsumer*) T  4295044992
.CC_OdbcDBStatement::setExecutionMode(int) T  4295043072
.CC_OdbcDBStatement::~CC_OdbcDBStatement() T  4294973792
Rate this response:  
Not yet rated
jackson.eyton



Group memberships:
Premium Members

Joined: 26 Oct 2017
Posts: 106

Points: 1791

Post Posted: Mon Jun 11, 2018 7:59 am Reply with quote    Back to top    

I really dont have any experience with BLOB data but have you tried using Run time Column Propagation (RCP) from the source to a sequential file? See what the BLOB data looks like then?

_________________
-Me
Rate this response:  
Not yet rated
jackson.eyton



Group memberships:
Premium Members

Joined: 26 Oct 2017
Posts: 106

Points: 1791

Post Posted: Thu Jun 14, 2018 9:32 am Reply with quote    Back to top    

Any luck on this?

_________________
-Me
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours