Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] InfiniBand path migration not working
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2012-03-08 10:44:22


Jeremy,
Finally I had a chance to look at log file.

Initially all qps are created on port 1, and in the same time alternative path loaded (ports 2, lids 4 and 2 ). I guess in some point you switch off port 1, APM even is reported because the alternative path is active now, and from some reason IB message is dropped.

You may ignore the APM warning. Essentially since the alternative path is active now, it is trying to see if OMPI may pre-load next good path for potential future failure on port 2. Since port 3 does not exist it reports the warning.

My educated guess is that from some reason it is no direct connection path between lid-2 and lid-4. To prove it we have to look and the OpenSM routing information.

On the mail list we have a representative from Mellanox that should be able to help us extract the routing information.

Evgeny,

Can you please help ?

Regards,

Pavel (Pasha) Shamis

---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Feb 29, 2012, at 5:38 PM, Jeremy wrote:
> Hi Pasha,
> 
>> On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel <shamisp_at_[hidden]> wrote:
>> 
>> I would like to see all the file.
>> 28MB is it the size after compression ?
>> 
>> I think gmail supports up to 25Mb.
>> You may try to create gzip file and then slice it using "split" command.
> 
> See attached. At about line 151311 is when I unplugged the cable from
> Port 1. Then I see the APM error message at about line 178905.
> 
> Thanks,
> 
> -Jeremy
> <debug.txt.bz2>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users