DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
This topic has been marked "Resolved."
Author Message
krishna81



Group memberships:
Premium Members

Joined: 16 May 2006
Posts: 78
Location: USA
Points: 790

Post Posted: Wed Aug 05, 2009 11:32 am Reply with quote    Back to top    

Hi i tried the above routine and i have an issue.i am ble to process less volune of records(ex:1000) and when i ran with 15 million records it is hang and no warnings.is there any buffer issue in this c program

_________________
Datastage User
Rate this response:  
Not yet rated
krishna81



Group memberships:
Premium Members

Joined: 16 May 2006
Posts: 78
Location: USA
Points: 790

Post Posted: Wed Aug 05, 2009 11:35 am Reply with quote    Back to top    

here is the program i tried and it is not working for huge records.i appreciate for any suggestions.


#include "stdio.h"
#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char *result = (char *)malloc (sizeof(char *));
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;

//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}

//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = strlen(str);}

//Get the character position in i for substring instance to start from
for (i = 0; str[i] != '\0' ; i++)
{
if (strstr(&str[i], subStr) == &str[i])
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
}

//Get everything before position i before replacement begins

x = 0;
while (i != x)
{ result[x++] = *str++; }

//Start replacement
while (*str) //for the complete input string
{

if (num != 0 ) // untill no more occurances need to be changed
{
if (strstr(str, subStr) == str )
{
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}

result[x] = '\0'; //Terminate the string
return result; //Return the replaced string

}

_________________
Datastage User
Rate this response:  
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42533
Location: Denver, CO
Points: 218869

Post Posted: Wed Aug 05, 2009 2:01 pm Reply with quote    Back to top    

Suggest you create a new Topic in the PX forum for this.

_________________
-craig

I know I don't say this enough, but I like when you talk to me. It's much better than when nobody talks to me. Or when people that I don't like will not stop talking to me.
Rate this response:  
Not yet rated
Sainath.Srinivasan

Premium Poster
Participant

Group memberships:
Heartland Usergroup

Joined: 17 Jan 2005
Posts: 3337
Location: United Kingdom
Points: 14195

Post Posted: Thu Aug 06, 2009 1:44 am Reply with quote    Back to top    

Initial looks suggest possible memory allocation problem due to malloc.

Try setting that as result[1000] or your maximum string length.

Alternatively you can use another variable at the end to hold the return value and free the result.
Rate this response:  
Not yet rated
DSguru2B

Premium Poster


since February 2006

Group memberships:
Premium Members, Heartland Usergroup

Joined: 09 Feb 2005
Posts: 6854
Location: Houston, TX
Points: 35675

Post Posted: Thu Oct 21, 2010 10:34 am Reply with quote    Back to top    

Krishna was kind enough to fix the memory allocation issue with large number of records by incorporating Sainath's suggestion. Click here for the updated routine. Ever since I wrote this and moved ...

_________________
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Rate this response:  
Not yet rated
Rob4732



Group memberships:
Premium Members

Joined: 06 Oct 2008
Posts: 66

Points: 556

Post Posted: Wed May 04, 2011 10:28 am Reply with quote    Back to top    

Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.

thx

_________________
We don't see things as they are;
We see them as we are.
Rate this response:  
Not yet rated
DSguru2B

Premium Poster


since February 2006

Group memberships:
Premium Members, Heartland Usergroup

Joined: 09 Feb 2005
Posts: 6854
Location: Houston, TX
Points: 35675

Post Posted: Thu May 05, 2011 9:11 am Reply with quote    Back to top    

Rob4732 wrote: Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though. thx Thats true. You ...

_________________
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Rate this response:  
Not yet rated
PhilHibbs



Group memberships:
Premium Members

Joined: 29 Sep 2004
Posts: 1043
Location: Nottingham, UK
Points: 12388

Post Posted: Mon Nov 21, 2011 10:36 am Reply with quote    Back to top    

DSguru2B wrote:
Ok, i finally got the chance to complete it.


Pardon me if my C/C++ coding skills are rusty, but I think there is a serious issue with this.
Code:

  char *result = (char *)malloc (sizeof(char *));

CMIIW, but that allocates a 4-byte buffer to write the result into, so if the result string is more than 3 characters long then this will overrun the allocated buffer and trash the stack.

To be honest, I can't see a way around this. Even if you allocate enough memory in the routine, you are returning a pointer to a buffer that will never be deallocated, and thus creating a memory leak. If you declare a static buffer, then it is not parallel-safe as every instance of the routine will have the same buffer. Does DataStage provide a hook for allocating memory that it will deallocate correctly afterwards? *Edit* Unless DataStage will always call free() on any char* that is returned in this way?

_________________
Phil Hibbs | Capgemini
Technical Consultant
Google+ Data Tools Page

Last edited by PhilHibbs on Mon Nov 21, 2011 11:08 am; edited 1 time in total
Rate this response:  
Not yet rated
PhilHibbs



Group memberships:
Premium Members

Joined: 29 Sep 2004
Posts: 1043
Location: Nottingham, UK
Points: 12388

Post Posted: Mon Nov 21, 2011 10:38 am Reply with quote    Back to top    

Additionally, I cannot get this test case to work:
Code:
pxEreplace( "TEST AA>BB", "AA", "BB", 0, 1 )

The output of that is "TEST AA>BB", rather than "TEST BB>BB". I can't get any multi-character replacement to work.

This works fine:
Code:
pxEreplace( "TEST P>Q PP>QQ", "P", "Q", 0, 1 )

...and returns this:
Code:
TEST Q>Q QQ>QQ


*UPDATE* Fixed it! The error is here:
Code:
     if (strstr(&str[i], subStr) == &str[i])
     {
      count++;
      i += oldlen - 1;
      if (count == beg)
      {break;}
     }

I changed it to this:
Code:
     if (strstr(&str[i], subStr) == &str[i])
     {
      count++;
      if (count == beg)
      {break;}
      i += oldlen - 1;
     }


Also, for performance reasons I replaced all references to strstr with strncmp instead:
Code:
     if (strncmp(&str[i], subStr, oldlen) == 0)

_________________
Phil Hibbs | Capgemini
Technical Consultant
Google+ Data Tools Page

Last edited by PhilHibbs on Mon Nov 21, 2011 11:08 am; edited 1 time in total
Rate this response:  
Not yet rated
PhilHibbs



Group memberships:
Premium Members

Joined: 29 Sep 2004
Posts: 1043
Location: Nottingham, UK
Points: 12388

Post Posted: Mon Nov 21, 2011 11:32 am Reply with quote    Back to top    

In fact, here is my complete version including malloc fix:
(code removed, see later post for source that fixes another issue)

_________________
Phil Hibbs | Capgemini
Technical Consultant
Google+ Data Tools Page

Last edited by PhilHibbs on Wed Nov 23, 2011 5:57 am; edited 1 time in total
Rate this response:  
Not yet rated
PhilHibbs



Group memberships:
Premium Members

Joined: 29 Sep 2004
Posts: 1043
Location: Nottingham, UK
Points: 12388

Post Posted: Wed Nov 23, 2011 3:59 am Reply with quote    Back to top    

This routine is now causing my job to abort with this:

APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV

Any ideas? I have disabled the main body of the routine in order to rule out a programming error:

Code:
#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );
  int newlen = strlen(rep);
  int oldlen = strlen(subStr);
  int i, x, count = 0;

  if (result==0) {return 0;}

strcpy(result,str);
return result;
...

*Update* It works fine in a test job that has a row generator that feeds 1000000 rows into a Transformer, that does 4 different replaces including a nested call that replaces two strings:
Code:
pxEreplace( pxEreplace( "AA>BB", ">", "=", 0, 1 ), "AA", "BB", 0, 0 )

_________________
Phil Hibbs | Capgemini
Technical Consultant
Google+ Data Tools Page
Rate this response:  
Not yet rated
PhilHibbs



Group memberships:
Premium Members

Joined: 29 Sep 2004
Posts: 1043
Location: Nottingham, UK
Points: 12388

Post Posted: Wed Nov 23, 2011 4:42 am Reply with quote    Back to top    

PhilHibbs wrote:
This routine is now causing my job to abort with this:

APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV


I think I have found it - it's if the str parameter is an empty string. Not sure why this is, as the function should just sail through and do nothing...

I have worked around by adding this to the start:
Code:
  if (buflen==1) {result[0]='\0'; return result;}

_________________
Phil Hibbs | Capgemini
Technical Consultant
Google+ Data Tools Page

Last edited by PhilHibbs on Fri Nov 25, 2011 12:32 pm; edited 1 time in total
Rate this response:  
Not yet rated
PhilHibbs



Group memberships:
Premium Members

Joined: 29 Sep 2004
Posts: 1043
Location: Nottingham, UK
Points: 12388

Post Posted: Fri Nov 25, 2011 12:32 pm Reply with quote    Back to top    

DataStage passes a null pointer rather than an empty string. This should fix the problems that that causes:

Code:
/******************************************************************************
* pxEreplace - DataStage parallel routine
*
* Published on DSXchange.com by user DSguru2B
* http://www.dsxchange.com/viewtopic.php?t=106358
*
* Bugs (malloc, realloc, count) fixed by Philip Hibbs, Capgemini
*
* INSTRUCTIONS
*
* 1. Copy the source file pxEreplace.cpp into a directory on the server
* 2. Run the following command:
*
*         g++ -O -fPIC -Wno-deprecated -c pxEreplace.cpp
*
* (check Administrator->Properties->Environment->Parallel->Compiler settings)
*
* 3. Copy the output into the DataStage library directory:
*
*         cp pxEreplace.o `cat /.dshome`/../PXEngine/lib/pxEreplace.o
*
* 4. Create the Server Routine with the following properties:
*
* Routine Name             : pxEreplace
* External subroutine name : pxEreplace
* Type                     : External function
* Object type              : Object
* Return type              : char*
* Library path             : /software/opt/IBM/InformationServer/Server/PXEngine/lib/pxEreplace.o
* Arguments:
*     str     I  char*
*     subStr  I  char*
*     rep     I  char*
*     num     I  int
*     beg     I  int
*
* Save & Close
*
* Any time that anything changes, you must recompile all jobs that use the routine.
*
******************************************************************************/

#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
  char empty[1]="";

  if (!str) {str = empty;}
  if (!subStr) {subStr = empty;}
  if (!rep) {rep = empty;}

  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );

  if (!result) {return 0;}
  if (buflen==1) {result[0]='\0'; return result;}

  int oldlen = strlen(subStr);
  int newlen = strlen(rep);

  int i, x, count = 0;

  if (oldlen==0)
  { // special case - insert rep once at the start of the string and return
    if (newlen>0)
    {
      buflen = buflen + newlen;
      result = (char *)realloc( result, buflen );
    }
    strcpy(result, rep);
    strcpy(result+newlen, str);
    return result;
  }

  //If begining is less than or equal to 1 then default it to 1
  if (beg <= 1)
  {beg = 1;}

  //replace all instances if value of num less than or equal to 0
  if (num <= 0)
  {num = buflen;}

  //Get the character position in i for substring instance to start from
  for (i = 0; str[i] != '\0' ; i++)
  {
    if (strncmp(&str[i], subStr, oldlen) == 0)
    {
      count++;
      if (count == beg) { break; }
      i += oldlen - 1;
    }
  }

  //Get everything before position i before replacement begins

  x = 0;
  while (i != x)
  {  result[x++] = *str++; }

  //Start replacement
  while (*str) //for the complete input string
  {

    if (num != 0 ) // untill no more occurances need to be changed
    {
      if (strncmp(str, subStr, oldlen) == 0)
      {
        if (newlen > oldlen)
        {
          buflen = buflen + (newlen - oldlen);
          result = (char *)realloc( result, buflen );
        }
        strcpy(&result[x], rep);
        x += newlen;
        str += oldlen;
        num--;
      }
      else // if no match is found
      {
        result[x++] = *str++;
      }
    }
    else
    {
      result[x++] = *str++;
    }
  }

  result[x] = '\0'; //Terminate the string
  return result; //Return the replaced string
}

_________________
Phil Hibbs | Capgemini
Technical Consultant
Google+ Data Tools Page
Rate this response:  
Not yet rated
RiyaNY
Participant



Joined: 15 Jan 2014
Posts: 9
Location: Mumbai
Points: 71

Post Posted: Wed Sep 28, 2016 3:21 am Reply with quote    Back to top    

Where should I put this code Embarassed
I want to use this function to replace a string in a transformer, I have no clue how should I go ahead with this code.

_________________
Warm Regards,
Riya Yawalkar
Rate this response:  
Not yet rated
chulett

Premium Poster


since January 2006

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 12 Nov 2002
Posts: 42533
Location: Denver, CO
Points: 218869

Post Posted: Wed Sep 28, 2016 6:54 am Reply with quote    Back to top    

Don't take this the wrong way but if you're not skilled in all of the ways of C++ then this isn't a path for you. IMHO, you'd be better served by starting a new post and letting us know what kind of a 'string problem' you are having. Then we can suggest alternatives.

_________________
-craig

I know I don't say this enough, but I like when you talk to me. It's much better than when nobody talks to me. Or when people that I don't like will not stop talking to me.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours