Page 3 of 4

issie with routine.

Posted: Wed Aug 05, 2009 11:32 am
by krishna81
Hi i tried the above routine and i have an issue.i am ble to process less volune of records(ex:1000) and when i ran with 15 million records it is hang and no warnings.is there any buffer issue in this c program

Posted: Wed Aug 05, 2009 11:35 am
by krishna81
here is the program i tried and it is not working for huge records.i appreciate for any suggestions.


#include "stdio.h"
#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char *result = (char *)malloc (sizeof(char *));
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;

//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}

//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = strlen(str);}

//Get the character position in i for substring instance to start from
for (i = 0; str != '\0' ; i++)
{
if (strstr(&str, subStr) == &str)
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
}

//Get everything before position i before replacement begins

x = 0;
while (i != x)
{ result[x++] = *str++; }

//Start replacement
while (*str) //for the complete input string
{

if (num != 0 ) // untill no more occurances need to be changed
{
if (strstr(str, subStr) == str )
{
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}

result[x] = '\0'; //Terminate the string
return result; //Return the replaced string

}

Posted: Wed Aug 05, 2009 2:01 pm
by chulett
Suggest you create a new Topic in the PX forum for this.

Posted: Thu Aug 06, 2009 1:44 am
by Sainath.Srinivasan
Initial looks suggest possible memory allocation problem due to malloc.

Try setting that as result[1000] or your maximum string length.

Alternatively you can use another variable at the end to hold the return value and free the result.

Posted: Thu Oct 21, 2010 10:34 am
by DSguru2B
Krishna was kind enough to fix the memory allocation issue with large number of records by incorporating Sainath's suggestion. Click here for the updated routine.
Ever since I wrote this and moved on to other clients I have'nt gotton a chance to work at an enterprise shop, yet. So could not update the routine based on feed back. I am glad others are taking initiative and making it better.

Posted: Wed May 04, 2011 10:28 am
by Rob4732
Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.

thx

Posted: Thu May 05, 2011 9:11 am
by DSguru2B
Rob4732 wrote:Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.

thx
Thats true. You would use pxEreplace() for any string replacements. If you want to get rid of a string, replace it with a space and then apply the Convert() function to get rid of the spaces.

Posted: Mon Nov 21, 2011 10:36 am
by PhilHibbs
DSguru2B wrote:Ok, i finally got the chance to complete it.
Pardon me if my C/C++ coding skills are rusty, but I think there is a serious issue with this.

Code: Select all

  char *result = (char *)malloc (sizeof(char *));
CMIIW, but that allocates a 4-byte buffer to write the result into, so if the result string is more than 3 characters long then this will overrun the allocated buffer and trash the stack.

To be honest, I can't see a way around this. Even if you allocate enough memory in the routine, you are returning a pointer to a buffer that will never be deallocated, and thus creating a memory leak. If you declare a static buffer, then it is not parallel-safe as every instance of the routine will have the same buffer. Does DataStage provide a hook for allocating memory that it will deallocate correctly afterwards? *Edit* Unless DataStage will always call free() on any char* that is returned in this way?

Posted: Mon Nov 21, 2011 10:38 am
by PhilHibbs
Additionally, I cannot get this test case to work:

Code: Select all

pxEreplace( "TEST AA>BB", "AA", "BB", 0, 1 )
The output of that is "TEST AA>BB", rather than "TEST BB>BB". I can't get any multi-character replacement to work.

This works fine:

Code: Select all

pxEreplace( "TEST P>Q PP>QQ", "P", "Q", 0, 1 )
...and returns this:

Code: Select all

TEST Q>Q QQ>QQ
*UPDATE* Fixed it! The error is here:

Code: Select all

     if (strstr(&str[i], subStr) == &str[i]) 
     { 
      count++; 
      i += oldlen - 1; 
      if (count == beg) 
      {break;} 
     } 
I changed it to this:

Code: Select all

     if (strstr(&str[i], subStr) == &str[i]) 
     { 
      count++; 
      if (count == beg) 
      {break;} 
      i += oldlen - 1; 
     } 
Also, for performance reasons I replaced all references to strstr with strncmp instead:

Code: Select all

     if (strncmp(&str[i], subStr, oldlen) == 0)

Posted: Mon Nov 21, 2011 11:32 am
by PhilHibbs
In fact, here is my complete version including malloc fix:
(code removed, see later post for source that fixes another issue)

Posted: Wed Nov 23, 2011 3:59 am
by PhilHibbs
This routine is now causing my job to abort with this:

APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV

Any ideas? I have disabled the main body of the routine in order to rule out a programming error:

Code: Select all

#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );
  int newlen = strlen(rep);
  int oldlen = strlen(subStr);
  int i, x, count = 0;

  if (result==0) {return 0;}

strcpy(result,str);
return result;
...
*Update* It works fine in a test job that has a row generator that feeds 1000000 rows into a Transformer, that does 4 different replaces including a nested call that replaces two strings:

Code: Select all

pxEreplace( pxEreplace( "AA>BB", ">", "=", 0, 1 ), "AA", "BB", 0, 0 )

Posted: Wed Nov 23, 2011 4:42 am
by PhilHibbs
PhilHibbs wrote:This routine is now causing my job to abort with this:

APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV
I think I have found it - it's if the str parameter is an empty string. Not sure why this is, as the function should just sail through and do nothing...

I have worked around by adding this to the start:

Code: Select all

  if (buflen==1) {result[0]='\0'; return result;}

Posted: Fri Nov 25, 2011 12:32 pm
by PhilHibbs
DataStage passes a null pointer rather than an empty string. This should fix the problems that that causes:

Code: Select all

/******************************************************************************
* pxEreplace - DataStage parallel routine
*
* Published on DSXchange.com by user DSguru2B
* http://www.dsxchange.com/viewtopic.php?t=106358
*
* Bugs (malloc, realloc, count) fixed by Philip Hibbs, Capgemini
*
* INSTRUCTIONS
*
* 1. Copy the source file pxEreplace.cpp into a directory on the server
* 2. Run the following command:
*
*         g++ -O -fPIC -Wno-deprecated -c pxEreplace.cpp
*
* (check Administrator->Properties->Environment->Parallel->Compiler settings)
*
* 3. Copy the output into the DataStage library directory:
*
*         cp pxEreplace.o `cat /.dshome`/../PXEngine/lib/pxEreplace.o
*
* 4. Create the Server Routine with the following properties:
*
* Routine Name             : pxEreplace
* External subroutine name : pxEreplace
* Type                     : External function
* Object type              : Object
* Return type              : char*
* Library path             : /software/opt/IBM/InformationServer/Server/PXEngine/lib/pxEreplace.o
* Arguments:
*     str     I  char*
*     subStr  I  char*
*     rep     I  char*
*     num     I  int
*     beg     I  int
*
* Save & Close
*
* Any time that anything changes, you must recompile all jobs that use the routine.
*
******************************************************************************/

#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
  char empty[1]="";

  if (!str) {str = empty;}
  if (!subStr) {subStr = empty;}
  if (!rep) {rep = empty;}

  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );

  if (!result) {return 0;}
  if (buflen==1) {result[0]='\0'; return result;}

  int oldlen = strlen(subStr);
  int newlen = strlen(rep);

  int i, x, count = 0;

  if (oldlen==0)
  { // special case - insert rep once at the start of the string and return
    if (newlen>0)
    {
      buflen = buflen + newlen;
      result = (char *)realloc( result, buflen );
    }
    strcpy(result, rep);
    strcpy(result+newlen, str);
    return result;
  }

  //If begining is less than or equal to 1 then default it to 1
  if (beg <= 1)
  {beg = 1;}

  //replace all instances if value of num less than or equal to 0
  if (num <= 0)
  {num = buflen;}

  //Get the character position in i for substring instance to start from
  for (i = 0; str[i] != '\0' ; i++)
  {
    if (strncmp(&str[i], subStr, oldlen) == 0)
    {
      count++;
      if (count == beg) { break; }
      i += oldlen - 1;
    }
  }

  //Get everything before position i before replacement begins

  x = 0;
  while (i != x)
  {  result[x++] = *str++; }

  //Start replacement
  while (*str) //for the complete input string
  {

    if (num != 0 ) // untill no more occurances need to be changed
    {
      if (strncmp(str, subStr, oldlen) == 0)
      {
        if (newlen > oldlen)
        {
          buflen = buflen + (newlen - oldlen);
          result = (char *)realloc( result, buflen );
        }
        strcpy(&result[x], rep);
        x += newlen;
        str += oldlen;
        num--;
      }
      else // if no match is found
      {
        result[x++] = *str++;
      }
    }
    else
    {
      result[x++] = *str++;
    }
  }

  result[x] = '\0'; //Terminate the string
  return result; //Return the replaced string
}

Posted: Wed Sep 28, 2016 3:21 am
by RiyaNY
Where should I put this code :oops:
I want to use this function to replace a string in a transformer, I have no clue how should I go ahead with this code.

Posted: Wed Sep 28, 2016 6:54 am
by chulett
Don't take this the wrong way but if you're not skilled in all of the ways of C++ then this isn't a path for you. IMHO, you'd be better served by starting a new post and letting us know what kind of a 'string problem' you are having. Then we can suggest alternatives.