Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] srun + Intel OpenMP = SIGSEGV
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-06-15 11:41:13


Makes sense to me - thanks!

On Jun 15, 2010, at 9:32 AM, Damien Guinier wrote:

> Using Intel OpenMP in conjunction with srun seems to cause a segmentation fault, at least in the 1.5 branch.
>
> After a long time tracking this strange bug, I finally found out that the slurmd ess component was corrupting the __environ structure. This results in a crash in Intel OpenMP, which calls getenv() after MPI_Finalize.
>
> In fact, during MPI_Init, the slurmd component calls putenv(), which creates a reference to a const string located in the mmap'ed text. At MPI_Finalize, we unmap() the component, which makes the __environ structure point to something that no longer exists.
>
> Since Intel OpenMP is looking for a environment variable that does not exist, it reads all variables in __environ and crashes.
>
> Here is a reproducer :
>
> /* launched by "srun --resv-port" */
> int main(int argc, char **argv) {
> MPI_Init(&argc, &argv);
> /* dlopens ess_slurmd.so */
> /* ess_slurmd.so will call putenv() */
> MPI_Finalize();
> /* dlcloses ess_slurmd.so */
> /* unmaps the reference, __environ is corrupted */
> getenv("unknown_var");
> /* Will read all vars from __environ and crash */
> }
>
> Attached is a patch to fix the bug. It calls unsetenv() at MPI_Finalize() to clean the environment.
>
> Thanks you
> Damien
>
>
> diff -r 9d999fdda967 -r 57de231642e2 orte/mca/ess/slurmd/ess_slurmd_module.c
> --- a/orte/mca/ess/slurmd/ess_slurmd_module.c Fri Jun 04 15:29:28 2010 +0200
> +++ b/orte/mca/ess/slurmd/ess_slurmd_module.c Tue Jun 15 11:45:02 2010 +0200
> @@ -387,7 +387,8 @@
> ORTE_ERROR_LOG(ret);
> }
> }
> -
> + unsetenv("OMPI_MCA_grpcomm");
> + unsetenv("OMPI_MCA_routed");
> /* deconstruct my nidmap and jobmap arrays - this
> * function protects itself from being called
> * before things were initialized
>