Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] --mca btl_openib_if_include
From: Mostyn Lewis (Mostyn.Lewis_at_[hidden])
Date: 2008-10-18 21:17:29


Jeff,

I traced this and it was the quote marks in "mlx4_0:1,mlx4_1:1" - they were
passed in and caused the mismatch :-(

Sorry about that.

Regards,
DM

On Sat, 18 Oct 2008, Jeff Squyres wrote:

> On Oct 16, 2008, at 9:10 PM, Mostyn Lewis wrote:
>
>> OpenMPI says for a:
>> mpirun --prefix
>> /tools/openmpi/1.4a1r19757_svn/connectx/gcc64/4.1.2/openib/rh_EL_4/x86_64/xeon
>> -x LD_LIBRARY_PATH --mca btl_openib_verbose 1 --mca btl openib,self --mca
>> btl_openib_if_include "mlx4_0:1,mlx4_1:1" -np 4 -machinefile dhosts
>> ./IMB-MPI1.openmpi
>>
>> --------------------------------------------------------------------------
>> WARNING: One or more nonexistent OpenFabrics devices/ports were
>> specified:
>>
>> Host: r4450_3
>> MCA parameter: mca_btl_if_include
>> Nonexistent entities: "mlx4_0:1,mlx4_1:1"
>
> I'm unable to replicate this problem. There might be some kind of bug in the
> if_include parsing code, I guess, but I can't make it happen on my machines.
> Can you dig into this code a bit?
>
> The code in question is in
> ompi/mca/btl/openib/btl_openib_component.c:get_port_list(). The general
> scheme of that routine is as follows:
>
> - mca_btl_openib_component.if_list is an argv-style array of the items listed
> in btl_openib_if_include.
> - we call get_port_list() for each device that is found
> - we compare each item in .if_list to the device name and device_name:port
> combination to see if it matches
> - if we match, we include/exclude the device or port
> - we then remove the entry from the .if_list
>
> Later, if there are any entries left in .if_list (_component.c:2257), then we
> didn't find them and issue the warning.
>
> Can dig into why items are being left on the .if_list?
>
> One thing I will mention; it looks like the help message may be a little
> ambiguous -- the ports aren't necessarily nonexistent, they could also be
> non-ACTIVE. From your ibstatus output, it doesn't look like this is the
> case, though (I assume the ibstatus output you showed was from the r4450_3
> host, right?).
>
> I'll go update that help message to be a bit more clear.
>
> FWIW, OMPI should normally silently ignore the DOWN ports and just run over
> the ACTIVE ports if you don't specify an _if_include list. But regardless,
> it would be good to solve this issue -- it's a bit troubling that you appear
> to be specifying ACTIVE ports and OMPI still issues a warning.
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users