special character handling

Formally known as "Mercator Inside Integrator 6.7", DataStage TX enables high-volume, complex transactions without the need for additional coding.

Moderators: chulett, rschirm

Post Reply
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

special character handling

Post by zaino22 »

I am new to DataStage TX. We are using non fixed length CSV file to convert to XML thru TX.

Group that wants to recieve file in XML format provide the XSDs to us so that we can create output Type Tree and so on. We create input Type Tree based on file layout and then create Maps.
They want us to handle all special XML characters in this process, but i dont know where would I do that handling? In Type Tree or maps? if it is in the Map, would it be in input map or output map?

Can this type of handling be performed in XSD they provide so that we dont have to worry about these charactes handling?

I understand there are total of 5 special charactes: ampersand, apostrophe etc.

Please provide detail answer, as your input is highly appreciated.
rep
Participant
Posts: 82
Joined: Tue Jun 19, 2007 8:04 am
Location: New York City

....

Post by rep »

I can probably help, but I'm not exactly sure what you mean.

You have a a csv on the input, you have XML on the output. If the csv is defined.


PODATE, PONUMBER, POITEM
10/01/01,12345, ABC

and the XML is defined:

<PO>
<PODATE>10/01/01</PODATE>
<PONUMBER>12345</PONUMBER>
<POITEM>ABC</POITEM>
</PO>

You just drag and drop the fields.

And input and output maps? you mean cards? If the csv is input, and the XML is output...


It's been a long time since I imported a XSD into a type tree. Have you run a quick test to see if it is handled automaticlly?
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

Re: special character handling in TX

Post by zaino22 »

my QUESTION is: Is it true that if Source CSV file that we want to convert in XML format has special character i.e & ' (ampersand, quotes etc) these are handled internally in DataStage TX, and we dont have to take care of them ourselves?

I mean unless we are writing XML ourselves we dont need to handle them in DataStage TX neither in the input Type Trees NOR in the imported XSD that we use to make Output Type Trees (for Target)??

please confirm.

Thanks!!

rep wrote:I can probably help, but I'm not exactly sure what you mean.

You have a a csv on the input, you have XML on the output. If the csv is defined.


PODATE, PONUMBER, POITEM
10/01/01,12345, ABC

and the XML is defined:

<PO>
<PODATE>10/01/01</PODATE>
<PONUMBER>12345</PONUMBER>
<POITEM>ABC</POITEM>
</PO>

You just drag and drop the fields.

And input and output maps? you mean cards? If the csv is input, and the XML is output...


It's been a long time since I imported a XSD into a type tree. Have you run a quick test to see if it is handled automaticlly?
rep
Participant
Posts: 82
Joined: Tue Jun 19, 2007 8:04 am
Location: New York City

.....

Post by rep »

Well, maybe there's something about XML (actually, I'm positive there's a lot) that I'm not aware of, but if I'm reading you correctly.


In the event the comma delimited file on the input had...ampersands, and who ever was getting the XML did not want the ampersands in the data, in the example in my post above, let's say POITEM, on the output in POITEM, where I would just drag it over and it would look like "

=POITEM:CSV Input:Data Stage Type Tree

, I would make it look like:

<XML POITEM FIELD>
=SUBSTITUTE(POITEM:CSV Input:Data Stage Type Tree , "&" , "")
<XML POITEM FIELD/>


The first item in the substitute function is the field you want to "substitute" some of strings in it (in this case &), after second comma is the value you want to substitute with something else ("&"), and the last value is what you want to substitute it for ("" <- two double quotes, meaning replace it with nothing).


In the event you want to remove all the &'s at once, build the output like the & is not even in the data for the first output card, then create a second output card and point it to a type tree definition of a blob (creating an "item" in a blank type tree has that characteristics, text, No terminator and no max record length. just a big ole blob of text). When the second output card is created, know how you click between output card 1 and output card 2? You just click on the piece of the card that starts with "#1" or "#2"? With the 1st output card on top, click and hold on the "#1". The curser will change into the circle with the slash through it, like the "no smoking" or "no fat chicks" sign. With it held down, drag it into gray space in between the input card window and the output card window. Now, if the input and output card are covering all gray space, you won't be able to drop it, but once you make the input card window smaller or move it out of the way, you'll be able to drag a copy of the output card in there. Now everything still looks the same, except you have an extra floating copy of output card 1. In the normal output card area, click on output card 2. In the ONE RULE THAT SHOULD BE THERE FOR THE TYPE THAT IS BLOB ON OUTPUT CARD 2, BECAUSE YOU FOLLOWED THESE FANTASIC INSTRUCTIONS I WROTE, BECAUSE YOU'RE LUCKY I HAD TIME TO KILL BEFORE I CAN LEAVE AND I'D RATHER DO THIS, BECAUSE ALL OF THIS IS IN THE TUTORITAL FOR THIS GREAT PRODUCT THAT'S WORTH LEARNING....

...drag the highest level of the XML output card FROM THE FLOATING WINDOW into the rule. Then add this

=SUBSTITUTE(Highest level of the XML from output card1 , "&" , "")

The output you really want now is from output card 2. Set it to "file" or whatever output card one would have been, and set output card 1 to "Sink" (look it up).


For any other characters other then &, like ]'s

=SUBSTITUTE(Highest level of the XML from output card1 , "&" , "", "]", "")

Just keep adding them to the one SUBSTITUTE function. If you want to put a different character for each & or ], put something in between the double quotes. Duh!

Now this a very important step. First, if this isn't what you were trying to do, lie and tell me what a wonderful person I am for type all that out and that it worked for you. Else, if this is what you were hoping for and it works, take out your check book and PM me. I'm not putting my personal info up here. What do you think this is, a BSMD website? Sheeesh!
janhess
Participant
Posts: 201
Joined: Thu Sep 18, 2003 2:18 am
Location: UK

Post by janhess »

If you import the XSD, the type tree should handle it depending on which version you have. There have been bugs in this area with some releases but generally all special characters are automatically converted. So & becomes & in the output.
If this is not happening then it is a bug with the xsd importer.
Post Reply