Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: guillaume ranquet (guillaume.ranquet_at_[hidden])
Date: 2010-06-02 13:31:49

Hash: SHA1

On 06/02/2010 06:00 PM, Scott Atchley wrote:
> On Jun 2, 2010, at 11:52 AM, Scott Atchley wrote:
>> What if you explicitly disable MX?
>> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca btl ^mx ~/bwlat/mpi_helloworld
> And can you try this as well?
> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld
> Thanks,
> Scott

of course I can :)

the first command seems to be wrong, I had an error message:
MCA framework parameters can only take a single negation operator
("^"), and it must be at the beginning of the value. The following
value violates this rule:


I tried to put the options in reverse order:
granquet_at_bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun
- --mca btl ^mx --mca btl openib,sm,self ~/bwlat/mpi_helloworld
  BTLs attempted: tcp
I guess I got the commandline wrong, It seems I disabled everything but tcp.

I then tried this:

granquet_at_bordeplage-26 ~ $ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl ^mx
[] Error in mx_init (error No MX
device entry in /dev.)
Hello world from process 0 of 1
[bordeplage-26:03346] *** Process received signal ***
[bordeplage-26:03346] Signal: Segmentation fault (11)
[bordeplage-26:03346] Signal code: Address not mapped (1)
[bordeplage-26:03346] Failing at address: 0x7fb51995b360
- --------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3346 on node exited on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------

as I'm not doing anything in that helloworld, I just put self in there.
granquet_at_bordeplage-26 ~ $ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl self
[] Error in mx_init (error No MX
device entry in /dev.)
Hello world from process 0 of 1

granquet_at_bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun
- --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld
Hello world from process 0 of 1
granquet_at_bordeplage-9 ~/openmpi-1.4.2 $

I can tell it works :)

> Ok, there is no segfault when it can't find IB.
> Which version of OMPI are you running on the IB nodes? 1.4.2?
> I can try to write a patch that does not alter the mpool if MX is not
> Scott

the goal is to run the same version everywhere on every nodes (for the
sake of simplicity).
the current plans were targeting 1.4.1.
I don't think our users would mind upgrading to 1.4.2.

thanks for the help, much appreciated :)

> _______________________________________________
> users mailing list
> users_at_[hidden]

Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla -