Page 1 of 2

Parallel routine appears to have a memory leak

Posted: Wed Sep 26, 2012 5:05 am
by PhilHibbs
Some time ago I took the pxEreplace routine that was posted on this forum and made a few fixes to it and started using it. It appears to be leaking memory, though. I set up a simple job that uses a Row Generator to generate 1,000,000 rows and writes them out to a Peek stage which logs the first 1000 rows.

I put all four of my parallel routines into the job, and with 10 parallel runs, our development box ran out of swap. So, clearly a lot of memory is being used. I ran a baseline test with no parallel routines, and memory usage was negligible.

Two of the routines do not allocate memory, so I ran a test with just those routines, and it also used negligible memory.

Test runs with each routine that does allocate memory showed significant memory usage, and with both together, I managed to use up all the swap space and get the jobs to fail with "main_program: APT_PMsectionLeader(1, node1), player 1 - Unexpected termination by Unix signal 9(SIGKILL)."

So: what's wrong with my routines? Source to follow in two separate replies.

*Edit* We are raising this issue with IBM, and will report any conclusions here, although I'm only at this client for two more days.

Posted: Wed Sep 26, 2012 5:06 am
by PhilHibbs

Code: Select all

/******************************************************************************
* pxEreplace - DataStage parallel routine
*
* Published on DSXchange.com by user DSguru2B
* http://www.dsxchange.com/viewtopic.php?t=106358
*
* Bugs (malloc, realloc, count) fixed by Philip Hibbs, Capgemini
*
* INSTRUCTIONS
*
* 1. Copy the source file pxEreplace.cpp into a directory on the server
* 2. Run the following command:
*
*         g++ -O -fPIC -Wno-deprecated -c pxEreplace.cpp
*
* (check Administrator->Properties->Environment->Parallel->Compiler settings)
*
* 3. Copy the output into the DataStage library directory:
*
*         cp pxEreplace.o `cat /.dshome`/../PXEngine/lib/pxEreplace.o
*
* 4. Create the Server Routine with the following properties:
*
* Routine Name             : pxEreplace
* External subroutine name : pxEreplace
* Type                     : External function
* Object type              : Object
* Return type              : char*
* Library path             : /software/opt/IBM/InformationServer/Server/PXEngine/lib/pxEreplace.o
* Arguments:
*     str     I  char*
*     subStr  I  char*
*     rep     I  char*
*     num     I  int
*     beg     I  int
*
* Save & Close
*
* Any time that anything changes, you must recompile all jobs that use the routine.
*
******************************************************************************/

#include "string.h"
#include "stdlib.h"

char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
  char empty[1]="";

  if (!str) {str = empty;}
  if (!subStr) {subStr = empty;}
  if (!rep) {rep = empty;}

  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );

  if (!result) {return 0;}
  if (buflen==1) {result[0]='\0'; return result;}

  int oldlen = strlen(subStr);
  int newlen = strlen(rep);

  int i, x, count = 0;

  if (oldlen==0)
  { // special case - insert rep once at the start of the string and return
    if (newlen>0)
    {
      buflen = buflen + newlen;
      result = (char *)realloc( result, buflen );
    }
    strcpy(result, rep);
    strcpy(result+newlen, str);
    return result;
  }

  //If begining is less than or equal to 1 then default it to 1
  if (beg <= 1)
  {beg = 1;}

  //replace all instances if value of num less than or equal to 0
  if (num <= 0)
  {num = buflen;}

  //Get the character position in i for substring instance to start from
  for (i = 0; str[i] != '\0' ; i++)
  {
    if (strncmp(&str[i], subStr, oldlen) == 0)
    {
      count++;
      if (count == beg) { break; }
      i += oldlen - 1;
    }
  }

  //Get everything before position i before replacement begins

  x = 0;
  while (i != x)
  {  result[x++] = *str++; }

  //Start replacement
  while (*str) //for the complete input string
  {

    if (num != 0 ) // untill no more occurances need to be changed
    {
      if (strncmp(str, subStr, oldlen) == 0)
      {
        if (newlen > oldlen)
        {
          buflen = buflen + (newlen - oldlen);
          result = (char *)realloc( result, buflen );
        }
        strcpy(&result[x], rep);
        x += newlen;
        str += oldlen;
        num--;
      }
      else // if no match is found
      {
        result[x++] = *str++;
      }
    }
    else
    {
      result[x++] = *str++;
    }
  }

  result[x] = '\0'; //Terminate the string
  return result; //Return the replaced string
}

Posted: Wed Sep 26, 2012 5:06 am
by PhilHibbs

Code: Select all

/******************************************************************************
* pxStrFilter - DataStage parallel routine
*
* Filters a string so that it only contains the characters in the specified list
*
* INSTRUCTIONS
*
* 1. Copy the source file pxEreplace.cpp into a directory on the server
* 2. Run the following command:
*
*         g++ -O -fPIC -Wno-deprecated -c pxStrFilter.cpp
*
* (check Administrator->Properties->Environment->Parallel->Compiler settings)
*
* 3. Copy the output into the DataStage library directory:
*
*         cp pxStrFilter.o `cat /.dshome`/../PXEngine/lib/pxStrFilter.o
*
* 4. Create the Server Routine with the following properties:
*
* Routine Name             : pxStringFilter
* External subroutine name : pxStringFilter
* Type                     : External function
* Object type              : Object
* Return type              : char*
* Library path             : /software/opt/IBM/InformationServer/Server/PXEngine/lib/pxStrFilter.o
* Arguments:
*     str     I  char*
*     chars   I  char*
*
* Save & Close
*
* Any time that anything changes, you must recompile all jobs that use the routine.
*
******************************************************************************/

#include "string.h"
#include "stdlib.h"

char* pxStrFilter(char *str, char *chars )
{
  if (!str) {return 0;}
  if (!chars) {return 0;}

  int buflen = strlen(str)+1;
  char *result = (char *)malloc( buflen );
  int dest, src;

  if (result==0) {return 0;}
  if (buflen==1) {result[0]='\0'; return result;}
  if (strlen(chars)==0) {result[0]='\0'; return result;}

  dest = src = 0;

  //Start replacement
  while (str[src]) //for the complete input string
  {
    if (strchr(chars, str[src]))
    {
      result[dest] = str[src];
      ++dest;
    }
    ++src;
  }

  result[dest] = '\0'; //Terminate the string
  return result; //Return the replaced string
}

Posted: Thu Sep 27, 2012 4:02 am
by PhilHibbs
IBM's response is this:
From my understanding of the issue, when you are discussing a parallel routine we are are talking about a C function writtien by yourselves, is that correct?

If it is correct the allocation and de-allocation of memory should be encapsulated in the C function or routine.

We do not expect The DataStage Server to de-allocate the memory allocation when a routine/function ends.
So this function must call free() itself. Which raises the question, how can it return a pointer that has been freed?

Posted: Thu Sep 27, 2012 4:07 am
by ray.wurlod
PhilHibbs wrote: Which raises the question, how can it return a pointer that has been freed?
Perhaps THAT is the question you should be putting to IBM?

Posted: Thu Sep 27, 2012 4:41 am
by eph
Hi,

I'm also interested in IBM answer as I have developed some C++ routines that could consume a lot of ram. The only job that consume a lot of ram for little number of rows is using 3-4 times in stages variable a px routine call.

Eric

Posted: Thu Sep 27, 2012 5:00 am
by ArndW
In writing custom build-ops and operators I've found that the only place with potential for memory leak is when using the builtin pointers to input record contents. These are not allocated on a word boundary as would be the case with malloc() and the like and thus these record buffers are difficult to copy (for temporary storage) and later write back to the buffers. DataStage allocates and de-allocates the record buffer address locations by itself and this potential offset of up to 8 bytes can preclude successful manual free() functions, which means that the buffer allocated space continually grows, until there is no more room or the job ends.

In your case you cannot free() up the return pointer's memory space within the job, but I don't think that this is the memory leak.

Posted: Thu Sep 27, 2012 5:39 am
by PhilHibbs
ArndW wrote:In your case you cannot free() up the return pointer's memory space within the job, but I don't think that this is the memory leak.
How can it not be a memory leak? The pxEreplace function is calling malloc and realloc and returning the pointer, DataStage is not calling free on the return pointer (which kind of makes sense since it doesn't know if that pointer was created using malloc or not) and so the memory usage just builds up and up until it runs out. Seems like a pretty straightforward memory leak to me.

Posted: Thu Sep 27, 2012 6:11 am
by ArndW
Phil,

I see what you mean and yes, the malloc()d space for the return pointer value may be the culprit - but on the other hand DataStage may be free()ing up that allocated space. Since the reference in a free() call is a void pointer it could be done within DataStage.

Although in c++ don't return values get passed by value (i.e. a copy of the contents is made) and then doesn't the variable "result" go out of scope immediately thereafter and the allocated space free()d automatically, or does it just go out of scope and the space is "leaked" away?

Posted: Thu Sep 27, 2012 6:34 am
by PhilHibbs
ArndW wrote:...DataStage may be free()ing up that allocated space. Since the reference in a free() call is a void pointer it could be done within DataStage.
IBM have explicitly stated that this does not happen.
ArndW wrote:Although in c++ don't return values get passed by value (i.e. a copy of the contents is made) and then doesn't the variable "result" go out of scope immediately thereafter and the allocated space free()d automatically, or does it just go out of scope and the space is "leaked" away?
Return values are passed by value, but the value of a char* is a pointer, and that value is copied, but not the contents of the area of memory that that value points to. If a variable that goes out of scope is an object (and not a pointer to an object), then that object's destructor will be called, and the destructor will call free on any memory that has been allocated with malloc, but there are no objects here. It would not help to use an object here anyway, because the free would still have to be called before DataStage got the chance to copy the results.

If this was Java then the memory would be deallocated at some point after all references to the pointer disappear, but C and C++ do not have automatic garbage collection of dangling pointers.

Posted: Thu Sep 27, 2012 7:34 am
by ArndW
Well - that really sucks; a memory-leak-by-design!

Posted: Fri Sep 28, 2012 3:24 am
by PhilHibbs
ArndW wrote:Well - that really sucks; a memory-leak-by-design!
Are you talking about C in general, or just in DataStage? If the latter, then I agree - it is impossible to write a Parallel Routine that returns a char* that doesn't do one of the following things:

1. Leak memory
2. Return dangerous dangling pointers to memory that could be trashed before DataStage picks up the return value
3. Just return a fixed piece of static text

IBM, you need to sort this out!

Posted: Mon Nov 05, 2012 8:01 am
by PhilHibbs
IBM have finally given a sensible response.
IBM wrote:Good Morning,

I did some further research and found some more information:

It would seem that there is no way to free the memory that is allocated in a C transform routine. You should not use malloc in a transform routine.

We suggest that if you have a complex function then you may want to consider using a BuildOp instead. Calling C Transform routines is really meant for routines that return native C data types (primitive data types). Complex functions should be created as BuildOp's or Custom C stages.

I hope this advice helps

Many thanks
Mark H

Posted: Mon Nov 05, 2012 8:12 am
by ArndW
While not thrilling news, I'm glad they confirmed what you've already found out. I'm glad I'm using BuildOps here - the one I ran today processed 114 Million records, each about 1Kb wide :)

Posted: Mon Nov 05, 2012 10:34 am
by ray.wurlod
Change() and Ereplace() functions are available in parallel Transformer stage from version 9.1, scheduled for GA in December 2012.