Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] srun + Intel OpenMP = SIGSEGV
From: Damien Guinier (damien.guinier_at_[hidden])
Date: 2010-06-15 11:32:44


Using Intel OpenMP in conjunction with srun seems to cause a
segmentation fault, at least in the 1.5 branch.

After a long time tracking this strange bug, I finally found out that
the slurmd ess component was corrupting the __environ structure. This
results in a crash in Intel OpenMP, which calls getenv() after
MPI_Finalize.

In fact, during MPI_Init, the slurmd component calls putenv(), which
creates a reference to a const string located in the mmap'ed text. At
MPI_Finalize, we unmap() the component, which makes the __environ
structure point to something that no longer exists.

Since Intel OpenMP is looking for a environment variable that does not
exist, it reads all variables in __environ and crashes.

Here is a reproducer :

/* launched by "srun --resv-port" */
int main(int argc, char **argv) {
      MPI_Init(&argc, &argv);
              /* dlopens ess_slurmd.so */
              /* ess_slurmd.so will call putenv() */
      MPI_Finalize();
              /* dlcloses ess_slurmd.so */
              /* unmaps the reference, __environ is corrupted */
      getenv("unknown_var");
              /* Will read all vars from __environ and crash */
}

Attached is a patch to fix the bug. It calls unsetenv() at
MPI_Finalize() to clean the environment.

Thanks you
Damien