Paralle routine to count num bytes
Moderators: chulett, rschirm, roy
Paralle routine to count num bytes
Hi, I need to count number of bytes of all fields in input from a positional TXT file without separator.
I need to use a parallel job but in parallel does not exists a function that count number of bytes (Len function count number of characters).
I've done a parallel routine in C that tested with an external gcc compiler rightly count number of byes with strlen() function.
int ParLenByte(char *s)
{
return strlen(s);
}
In datastage I read TXT file with Sequential File Stage (or Complex Flat File) type CHAR to segmentate records in fields of known length.
Then I pass records to a trasformer when I call the parallel routine.
The result is the length of the CHAR type defined in datastage and not the real bytes' number.
If I read the entire record VARCHAR (without segmentation of fields) it works so I presume datatage passes truncate string to the parallel routine when I read it CHAR type.
Any suggestion?
I need to use a parallel job but in parallel does not exists a function that count number of bytes (Len function count number of characters).
I've done a parallel routine in C that tested with an external gcc compiler rightly count number of byes with strlen() function.
int ParLenByte(char *s)
{
return strlen(s);
}
In datastage I read TXT file with Sequential File Stage (or Complex Flat File) type CHAR to segmentate records in fields of known length.
Then I pass records to a trasformer when I call the parallel routine.
The result is the length of the CHAR type defined in datastage and not the real bytes' number.
If I read the entire record VARCHAR (without segmentation of fields) it works so I presume datatage passes truncate string to the parallel routine when I read it CHAR type.
Any suggestion?
Welcome.
I assume you want the length of the data in the CHAR field, something automatically padded with (typically) spaces out to its full size... a.ka. the nature of the beast. Meaning a CHAR(10) that looks like this:
You want to know that the actual data length is 6 rather than 10, is that correct? So that you can do what next? I'm thinking knowing what comes next / what that knowledge would be used for can lead to a best practice solution which probably does not include the need for a custom "parallel routine to count bytes".
I assume you want the length of the data in the CHAR field, something automatically padded with (typically) spaces out to its full size... a.ka. the nature of the beast. Meaning a CHAR(10) that looks like this:
Code: Select all
"6CHARS "
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Do you need to be able to work with multi-byte data?
In the BASIC Transformer stage you have access to three length functions.
LEN returns the number of characters
BYTELEN returns the number of bytes
DISPLEN returns the number of display positions (e.g. when using double-width or half-width characters)
In the BASIC Transformer stage you have access to three length functions.
LEN returns the number of characters
BYTELEN returns the number of bytes
DISPLEN returns the number of display positions (e.g. when using double-width or half-width characters)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
strlen counts whitespace, it counts everything until 0 (end of C string) is hit in ascii, and similar with the wide version for multi-byte.
char in datastage is padded with spaces to always consume the max length.
varchar in datastage is what the data is, up to the max length (truncates).
I would get the data as varchar, and I think a regular transformer can get the length of the string there? Is that possible with your design?
char in datastage is padded with spaces to always consume the max length.
varchar in datastage is what the data is, up to the max length (truncates).
I would get the data as varchar, and I think a regular transformer can get the length of the string there? Is that possible with your design?
No I want to count spaces also, so I want 10 as result.chulett wrote:I assume you want the length of the data in the CHAR field, something automatically padded with (typically) spaces out to its full size... a.ka. the nature of the beast. Meaning a CHAR(10) that looks like this:
You want to know that the actual data length is 6 rather than 10, is that correct? So that you can do what next? I'm thinking knowing what comes next / what that knowledge would be used for can lead to a best practice solution which probably does not include the need for a custom "parallel routine to count bytes".Code: Select all
"6CHARS "
I don't want to use BASIC Transformer but parallel transformerray.wurlod wrote:Do you need to be able to work with multi-byte data?
In the BASIC Transformer stage you have access to three length functions.
LEN returns the number of characters
BYTELEN returns the number of bytes
DISPLEN returns the number of display positions (e.g. when using double-width or half-width characters)
I try to read entire record varchar but when i truncate it with subristring and I apply routine function I have the same wrong result.UCDI wrote:strlen counts whitespace, it counts everything until 0 (end of C string) is hit in ascii, and similar with the wide version for multi-byte.
char in datastage is padded with spaces to always consume the max length.
varchar in datastage is what the data is, up to the max length (truncates).
I would get the data as varchar, and I think a regular transformer can get the length of the string there? Is that possible with your design?
Reason is that I read file from foreign banks such as arabian so a character that seems to has length of char=1 can take 2 bytes.chulett wrote:Would still like to know the "why" of this.
if you want the raw # of bytes for unicode or multi-byte chars, you are going to have to extract it in a way that you can look at bytes, maybe sql extract it as hex, an then count those.
Datastage can handle multi byte characters to get the data to you but I don't know that a string length will give you what you want because the length of 5 2 byte letters is 5, not 10... you have to force it to be bytes and count those. And you can't just do it casually with C or the like, because a 2 byte char has stuff like 00 A3 or whatever and that 00 converted to a byte looks like the end of string... in fact, are you sure that your strlen approach actually is the correct answer...?
Datastage can handle multi byte characters to get the data to you but I don't know that a string length will give you what you want because the length of 5 2 byte letters is 5, not 10... you have to force it to be bytes and count those. And you can't just do it casually with C or the like, because a 2 byte char has stuff like 00 A3 or whatever and that 00 converted to a byte looks like the end of string... in fact, are you sure that your strlen approach actually is the correct answer...?
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
...While reading the file with the sequential stage or complex flat file you would need to use NChar data type or the unicode extended atribute, and would need to define the NLS for the file to handle the multi bytes characters
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses