px version of Ereplace()

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

krishna81
Premium Member
Premium Member
Posts: 78
Joined: Tue May 16, 2006 8:01 am
Location: USA

issie with routine.

Post by krishna81 »

Hi i tried the above routine and i have an issue.i am ble to process less volune of records(ex:1000) and when i ran with 15 million records it is hang and no warnings.is there any buffer issue in this c program
Datastage User
krishna81
Premium Member
Premium Member
Posts: 78
Joined: Tue May 16, 2006 8:01 am
Location: USA

Post by krishna81 »

here is the program i tried and it is not working for huge records.i appreciate for any suggestions.


#include "stdio.h"
#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char *result = (char *)malloc (sizeof(char *));
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;

//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}

//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = strlen(str);}

//Get the character position in i for substring instance to start from
for (i = 0; str != '\0' ; i++)
{
if (strstr(&str, subStr) == &str)
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
}

//Get everything before position i before replacement begins

x = 0;
while (i != x)
{ result[x++] = *str++; }

//Start replacement
while (*str) //for the complete input string
{

if (num != 0 ) // untill no more occurances need to be changed
{
if (strstr(str, subStr) == str )
{
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}

result[x] = '\0'; //Terminate the string
return result; //Return the replaced string

}
Datastage User
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Suggest you create a new Topic in the PX forum for this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Initial looks suggest possible memory allocation problem due to malloc.

Try setting that as result[1000] or your maximum string length.

Alternatively you can use another variable at the end to hold the return value and free the result.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Krishna was kind enough to fix the memory allocation issue with large number of records by incorporating Sainath's suggestion. Click here for the updated routine.
Ever since I wrote this and moved on to other clients I have'nt gotton a chance to work at an enterprise shop, yet. So could not update the routine based on feed back. I am glad others are taking initiative and making it better.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Rob4732
Premium Member
Premium Member
Posts: 66
Joined: Mon Oct 06, 2008 5:14 pm

Post by Rob4732 »

Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.

thx
We don't see things as they are;
We see them as we are.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Rob4732 wrote:Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.

thx
Thats true. You would use pxEreplace() for any string replacements. If you want to get rid of a string, replace it with a space and then apply the Convert() function to get rid of the spaces.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

DSguru2B wrote:Ok, i finally got the chance to complete it.
Pardon me if my C/C++ coding skills are rusty, but I think there is a serious issue with this.

Code: Select all

  char *result = (char *)malloc (sizeof(char *));
CMIIW, but that allocates a 4-byte buffer to write the result into, so if the result string is more than 3 characters long then this will overrun the allocated buffer and trash the stack.

To be honest, I can't see a way around this. Even if you allocate enough memory in the routine, you are returning a pointer to a buffer that will never be deallocated, and thus creating a memory leak. If you declare a static buffer, then it is not parallel-safe as every instance of the routine will have the same buffer. Does DataStage provide a hook for allocating memory that it will deallocate correctly afterwards? *Edit* Unless DataStage will always call free() on any char* that is returned in this way?
Last edited by PhilHibbs on Mon Nov 21, 2011 11:08 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Additionally, I cannot get this test case to work:

Code: Select all

pxEreplace( "TEST AA>BB", "AA", "BB", 0, 1 )
The output of that is "TEST AA>BB", rather than "TEST BB>BB". I can't get any multi-character replacement to work.

This works fine:

Code: Select all

pxEreplace( "TEST P>Q PP>QQ", "P", "Q", 0, 1 )
...and returns this:

Code: Select all

TEST Q>Q QQ>QQ
*UPDATE* Fixed it! The error is here:

Code: Select all

     if (strstr(&str[i], subStr) == &str[i]) 
     { 
      count++; 
      i += oldlen - 1; 
      if (count == beg) 
      {break;} 
     } 
I changed it to this:

Code: Select all

     if (strstr(&str[i], subStr) == &str[i]) 
     { 
      count++; 
      if (count == beg) 
      {break;} 
      i += oldlen - 1; 
     } 
Also, for performance reasons I replaced all references to strstr with strncmp instead:

Code: Select all

     if (strncmp(&str[i], subStr, oldlen) == 0)
Last edited by PhilHibbs on Mon Nov 21, 2011 11:08 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

In fact, here is my complete version including malloc fix:
(code removed, see later post for source that fixes another issue)
Last edited by PhilHibbs on Wed Nov 23, 2011 5:57 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

This routine is now causing my job to abort with this:

APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV

Any ideas? I have disabled the main body of the routine in order to rule out a programming error:

Code: Select all

#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );
  int newlen = strlen(rep);
  int oldlen = strlen(subStr);
  int i, x, count = 0;

  if (result==0) {return 0;}

strcpy(result,str);
return result;
...
*Update* It works fine in a test job that has a row generator that feeds 1000000 rows into a Transformer, that does 4 different replaces including a nested call that replaces two strings:

Code: Select all

pxEreplace( pxEreplace( "AA>BB", ">", "=", 0, 1 ), "AA", "BB", 0, 0 )
Phil Hibbs | Capgemini
Technical Consultant
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

PhilHibbs wrote:This routine is now causing my job to abort with this:

APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV
I think I have found it - it's if the str parameter is an empty string. Not sure why this is, as the function should just sail through and do nothing...

I have worked around by adding this to the start:

Code: Select all

  if (buflen==1) {result[0]='\0'; return result;}
Last edited by PhilHibbs on Fri Nov 25, 2011 12:32 pm, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

DataStage passes a null pointer rather than an empty string. This should fix the problems that that causes:

Code: Select all

/******************************************************************************
* pxEreplace - DataStage parallel routine
*
* Published on DSXchange.com by user DSguru2B
* http://www.dsxchange.com/viewtopic.php?t=106358
*
* Bugs (malloc, realloc, count) fixed by Philip Hibbs, Capgemini
*
* INSTRUCTIONS
*
* 1. Copy the source file pxEreplace.cpp into a directory on the server
* 2. Run the following command:
*
*         g++ -O -fPIC -Wno-deprecated -c pxEreplace.cpp
*
* (check Administrator->Properties->Environment->Parallel->Compiler settings)
*
* 3. Copy the output into the DataStage library directory:
*
*         cp pxEreplace.o `cat /.dshome`/../PXEngine/lib/pxEreplace.o
*
* 4. Create the Server Routine with the following properties:
*
* Routine Name             : pxEreplace
* External subroutine name : pxEreplace
* Type                     : External function
* Object type              : Object
* Return type              : char*
* Library path             : /software/opt/IBM/InformationServer/Server/PXEngine/lib/pxEreplace.o
* Arguments:
*     str     I  char*
*     subStr  I  char*
*     rep     I  char*
*     num     I  int
*     beg     I  int
*
* Save & Close
*
* Any time that anything changes, you must recompile all jobs that use the routine.
*
******************************************************************************/

#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
  char empty[1]="";

  if (!str) {str = empty;}
  if (!subStr) {subStr = empty;}
  if (!rep) {rep = empty;}

  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );

  if (!result) {return 0;}
  if (buflen==1) {result[0]='\0'; return result;}

  int oldlen = strlen(subStr);
  int newlen = strlen(rep);

  int i, x, count = 0;

  if (oldlen==0)
  { // special case - insert rep once at the start of the string and return
    if (newlen>0)
    {
      buflen = buflen + newlen;
      result = (char *)realloc( result, buflen );
    }
    strcpy(result, rep);
    strcpy(result+newlen, str);
    return result;
  }

  //If begining is less than or equal to 1 then default it to 1
  if (beg <= 1)
  {beg = 1;}

  //replace all instances if value of num less than or equal to 0
  if (num <= 0)
  {num = buflen;}

  //Get the character position in i for substring instance to start from
  for (i = 0; str[i] != '\0' ; i++)
  {
    if (strncmp(&str[i], subStr, oldlen) == 0)
    {
      count++;
      if (count == beg) { break; }
      i += oldlen - 1;
    }
  }

  //Get everything before position i before replacement begins

  x = 0;
  while (i != x)
  {  result[x++] = *str++; }

  //Start replacement
  while (*str) //for the complete input string
  {

    if (num != 0 ) // untill no more occurances need to be changed
    {
      if (strncmp(str, subStr, oldlen) == 0)
      {
        if (newlen > oldlen)
        {
          buflen = buflen + (newlen - oldlen);
          result = (char *)realloc( result, buflen );
        }
        strcpy(&result[x], rep);
        x += newlen;
        str += oldlen;
        num--;
      }
      else // if no match is found
      {
        result[x++] = *str++;
      }
    }
    else
    {
      result[x++] = *str++;
    }
  }

  result[x] = '\0'; //Terminate the string
  return result; //Return the replaced string
}
Phil Hibbs | Capgemini
Technical Consultant
RiyaNY
Participant
Posts: 12
Joined: Wed Jan 15, 2014 10:32 pm
Location: Mumbai

Post by RiyaNY »

Where should I put this code :oops:
I want to use this function to replace a string in a transformer, I have no clue how should I go ahead with this code.
Warm Regards,
Riya Yawalkar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Don't take this the wrong way but if you're not skilled in all of the ways of C++ then this isn't a path for you. IMHO, you'd be better served by starting a new post and letting us know what kind of a 'string problem' you are having. Then we can suggest alternatives.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply