Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] --mca btl_openib_if_include
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-18 07:48:29


On Oct 16, 2008, at 9:10 PM, Mostyn Lewis wrote:

> OpenMPI says for a:
> mpirun --prefix /tools/openmpi/1.4a1r19757_svn/connectx/gcc64/4.1.2/
> openib/rh_EL_4/x86_64/xeon -x LD_LIBRARY_PATH --mca
> btl_openib_verbose 1 --mca btl openib,self --mca
> btl_openib_if_include "mlx4_0:1,mlx4_1:1" -np 4 -machinefile
> dhosts ./IMB-MPI1.openmpi
>
> --------------------------------------------------------------------------
> WARNING: One or more nonexistent OpenFabrics devices/ports were
> specified:
>
> Host: r4450_3
> MCA parameter: mca_btl_if_include
> Nonexistent entities: "mlx4_0:1,mlx4_1:1"

I'm unable to replicate this problem. There might be some kind of bug
in the if_include parsing code, I guess, but I can't make it happen on
my machines. Can you dig into this code a bit?

The code in question is in ompi/mca/btl/openib/
btl_openib_component.c:get_port_list(). The general scheme of that
routine is as follows:

- mca_btl_openib_component.if_list is an argv-style array of the items
listed in btl_openib_if_include.
- we call get_port_list() for each device that is found
- we compare each item in .if_list to the device name and
device_name:port combination to see if it matches
- if we match, we include/exclude the device or port
- we then remove the entry from the .if_list

Later, if there are any entries left in .if_list (_component.c:2257),
then we didn't find them and issue the warning.

Can dig into why items are being left on the .if_list?

One thing I will mention; it looks like the help message may be a
little ambiguous -- the ports aren't necessarily nonexistent, they
could also be non-ACTIVE. From your ibstatus output, it doesn't look
like this is the case, though (I assume the ibstatus output you showed
was from the r4450_3 host, right?).

I'll go update that help message to be a bit more clear.

FWIW, OMPI should normally silently ignore the DOWN ports and just run
over the ACTIVE ports if you don't specify an _if_include list. But
regardless, it would be good to solve this issue -- it's a bit
troubling that you appear to be specifying ACTIVE ports and OMPI still
issues a warning.

-- 
Jeff Squyres
Cisco Systems