Copy-and-paste-error: The second part of the fix ought to be:

        if ( !have_wdir ) {
          free(cwd);
        }

Murat




Murat Knecht schrieb:
Hi all,

I think, I found a bug and a fix for it.
Could someone verify the rationale behind this bug, as I have this
SIGSEG on only one of two machines, and I don't quite see why it doesn't
occur always. (Same testprogram, equally compiled 1.2.4 OpenMPI).
Though the fix does prevent the segmentation fault. :)

Thanks,
Murat





Where:
Bug:
free() crashes when trying to free stack memory
ompi/communicator/comm_dyn.c:630
    
    OBJ_RELEASE(apps[i]);


SIGSEG:
orte/mca/rmgr/rmgr_types.h:113

        free (app_context->cwd);


    
There are two ways that apps[i]->cwd is filled:
1. dynamically allocated memory
548     if ( !have_wdir ) {
            getcwd(cwd, OMPI_PATH_MAX);
            apps[i]->cwd = strdup(cwd);    // <--
        }

2. stack
354    char cwd[OMPI_PATH_MAX];
// ...
516         /* check for 'wdir' */
            ompi_info_get (array_of_info[i], "wdir", valuelen, cwd, &flag);
            if ( flag ) {
                apps[i]->cwd = cwd;  // <--
                have_wdir = 1;
            }



Fix: Allocate cwd always manually and make sure, it is deleted afterwards.

1.
<    char cwd[OMPI_PATH_MAX];
---
  
   char *cwd = (char*)malloc(OMPI_PATH_MAX);
    

2. And on cleanup (somewhere below line 624)

  
       if ( !have_wdir ) {
           getcwd(cwd, OMPI_PATH_MAX);
           apps[i]->cwd = strdup(cwd);
       }
    

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users