Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] two questions about 1.7.1
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2013-12-04 04:31:32


On 12/03/13 23:27, Jeff Squyres (jsquyres) wrote:
> On Nov 22, 2013, at 1:19 PM, Paul Kapinos <kapinos_at_[hidden]> wrote:
>
>> Well, I've tried this path on actual 1.7.3 (where the code is moved some 12 lines - beginning with 2700).
>> !! - no output "skipping device"! Also when starting main processes and -bind-to-socket used. What I see is
>>> [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_1, port 1
>>> [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device
>>> [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_0, port 1
>>> [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device
>
> That's actually ok -- that's from the usnic BTL, not the openib BTL.
>
> The usnic BTL is the Cisco UD verbs component, and it only works with Cisco UCS servers and VICs; it will not work with generic IB cards. Hence, these messages are telling you that the usnic BTL is disqualifying itself because the ibv devices it found are not Cisco UCS VICs.
>

Argh - what a shame not to see "btl:usnic" :-|

> Look for the openib messages, not the usnic messages.

Well, as said there were *no messages* form the patch you provided in
http://www.open-mpi.org/community/lists/devel/2013/06/12472.php

I've attached of a run with single process per node on nodes with 2 NICs, maybe
you can see what goes wrong..

Best

Paul

-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915