Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-02 13:51:41


On Jun 2, 2010, at 1:31 PM, guillaume ranquet wrote:

> > ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld
>
> the first command seems to be wrong, I had an error message:
> MCA framework parameters can only take a single negation operator

Correct. Scott's 2nd one should be tried.

Open MPI has 2 flavors of MX support: a BTL and an MTL. You need to disable both of them. His 2nd command effectively does that:

1. cm is the PML is uses MTLs; by disabling cm, you're disabling all MTLs -- including the MX MTL
2. By specifically listing openib,sm,self, you're only allowing those BTLs to be used (not the MX BTL).

> granquet_at_bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun
> - --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld
> Hello world from process 0 of 1
> granquet_at_bordeplage-9 ~/openmpi-1.4.2 $

Good.

> > Ok, there is no segfault when it can't find IB.

I'm not sure I follow this comment.

>From your prior mails:

- there's no segv when ptmalloc is disabled at run-time via the env var
- there's no segv when MX is completed disabled (both BTL and MTL)

What happens if you run with only MX? I *assume* that works with no segv...?

It might be interesting to see what happens if you run with:

mpirun --mca btl mx,openib,sm,self --mca pml ^cm --mca mpi_leave_pinned 0 ...yourapp...

This should run with both verbs and MX, and the memory manager is in place at run-time, but it isn't being used to track memory. That's slightly different than having the memory manager in place at run-time *and* using it to track memory.

> the goal is to run the same version everywhere on every nodes (for the
> sake of simplicity).
> the current plans were targeting 1.4.1.
> I don't think our users would mind upgrading to 1.4.2.

FWIW, it *is* the same version on all nodes -- you're just running with different MCA parameter values. Also FWIW, the sysadmin can hide these MCA params in a system-level file so that users don't have to deal with them, if that works for you. See:

    http://www.open-mpi.org/faq/?category=tuning#setting-mca-params

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/