Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: guillaume ranquet (guillaume.ranquet_at_[hidden])
Date: 2010-06-02 13:31:49


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/02/2010 06:00 PM, Scott Atchley wrote:
> On Jun 2, 2010, at 11:52 AM, Scott Atchley wrote:
>
>> What if you explicitly disable MX?
>>
>> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca btl ^mx ~/bwlat/mpi_helloworld
>
> And can you try this as well?
>
> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld
>
> Thanks,
>
> Scott

of course I can :)

the first command seems to be wrong, I had an error message:
MCA framework parameters can only take a single negation operator
("^"), and it must be at the beginning of the value. The following
value violates this rule:

    openib,sm,self,^mx

I tried to put the options in reverse order:
granquet_at_bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun
- --mca btl ^mx --mca btl openib,sm,self ~/bwlat/mpi_helloworld
<snip>
  BTLs attempted: tcp
</snip>
I guess I got the commandline wrong, It seems I disabled everything but tcp.

I then tried this:

granquet_at_bordeplage-26 ~ $ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl ^mx
~/bwlat/mpi_helloworld
[bordeplage-26.bordeaux.grid5000.fr:03346] Error in mx_init (error No MX
device entry in /dev.)
Hello world from process 0 of 1
[bordeplage-26:03346] *** Process received signal ***
[bordeplage-26:03346] Signal: Segmentation fault (11)
[bordeplage-26:03346] Signal code: Address not mapped (1)
[bordeplage-26:03346] Failing at address: 0x7fb51995b360
- --------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3346 on node
bordeplage-26.bordeaux.grid5000.fr exited on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------

as I'm not doing anything in that helloworld, I just put self in there.
granquet_at_bordeplage-26 ~ $ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl self
~/bwlat/mpi_helloworld
[bordeplage-26.bordeaux.grid5000.fr:03375] Error in mx_init (error No MX
device entry in /dev.)
Hello world from process 0 of 1

granquet_at_bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun
- --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld
Hello world from process 0 of 1
granquet_at_bordeplage-9 ~/openmpi-1.4.2 $

I can tell it works :)

> Ok, there is no segfault when it can't find IB.
>
> Which version of OMPI are you running on the IB nodes? 1.4.2?
>
> I can try to write a patch that does not alter the mpool if MX is not
available.
>
> Scott

the goal is to run the same version everywhere on every nodes (for the
sake of simplicity).
the current plans were targeting 1.4.1.
I don't think our users would mind upgrading to 1.4.2.

thanks for the help, much appreciated :)

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMBpWFAAoJEEzIl7PMEAliyTYIALbBDyZbDBV0PUjzJ3HFG9Nx
ihfhcygHf8Gt+nRpcFDaY8msyj0NSpPMyA9Mq0ljrGqw090z4srqF3WBFY/isxkj
W9cjxURIlLrZsnTmd767lr1WQP3Mfg7UG6Ti3rt6CAl870efJtfC/Dz+H8+aoj28
X7EcUIqUcr137m5IXz2vsxfjlmgf4zmEkTA3veYJSpcdtMqv24gCQgu6o7LFNP4+
a9++/sIx9/xn4qInIyNOgQr2YedAKPP0+leHoLY6c/WTzKrOh/qV8fZOBc/Jf72l
wov4VnLXk1MDozYt+/rY+3Jvmq0GpeISh1X4cYll01Mf+Zq0tnFOLoFSpUDjAU4=
=EVxy
-----END PGP SIGNATURE-----