Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] InfiniBand path migration not working
From: Yevgeny Kliteynik (kliteyn_at_[hidden])
Date: 2012-03-11 04:05:23


Hi,

I just noticed that my previous mail bounced,
but it doesn't matter. Please ignore it if
you got it anyway - I re-read the thread and
there is a much simpler way to do it.

If you want to check whether LID L is reachable
through HCA H from port P, you can run this command:

   smpquery --Ca H --Port P NodeInfo L

Example:

   smpquery --Ca mlx4_0 --Port 2 NodeInfo 4

If you don't get response or you get info of
the device different that what you would expect,
then the two ports are not part of the same
subnet, and APN is expected to fail.
Otherwise - it's probably a bug.

-- YK

On 08-Mar-12 5:44 PM, Shamis, Pavel wrote:
> Jeremy,
> Finally I had a chance to look at log file.
>
> Initially all qps are created on port 1, and in the same time alternative path loaded (ports 2, lids 4 and 2 ). I guess in some point you switch off port 1, APM even is reported because the alternative path is active now, and from some reason IB message is dropped.
>
> You may ignore the APM warning. Essentially since the alternative path is active now, it is trying to see if OMPI may pre-load next good path for potential future failure on port 2. Since port 3 does not exist it reports the warning.
>
> My educated guess is that from some reason it is no direct connection path between lid-2 and lid-4. To prove it we have to look and the OpenSM routing information.
>
> On the mail list we have a representative from Mellanox that should be able to help us extract the routing information.
>
> Evgeny,
>
> Can you please help ?
>
>
> Regards,
>
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
>
>
>
>
>
>
> On Feb 29, 2012, at 5:38 PM, Jeremy wrote:
>
>> Hi Pasha,
>>
>>> On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel<shamisp_at_[hidden]> wrote:
>>>
>>> I would like to see all the file.
>>> 28MB is it the size after compression ?
>>>
>>> I think gmail supports up to 25Mb.
>>> You may try to create gzip file and then slice it using "split" command.
>> See attached. At about line 151311 is when I unplugged the cable from
>> Port 1. Then I see the APM error message at about line 178905.
>>
>> Thanks,
>>
>> -Jeremy
>> <debug.txt.bz2>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>