Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] problems with MPI_Waitsome/MPI_Allstart and OpenMPI on gigabit and IB networks
From: Joe Landman (landman_at_[hidden])
Date: 2008-07-20 10:58:00


Joe Landman wrote:

>
> 3) using btl to turn off sm and openib, generates lots of these messages:
>
> [c1-8][0,1,4][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
> connect() failed with errno=113

[...]

> No route to host at -e line 1.
>
> This is wrong, all the nodes are visible from all the other nodes on a
> private subnet. For example:

ok, fixed this. Turns out we have ipoib going, and one adapter needed
to be brought down and back up. Now the tcp version appears to be
running, though I do get the strange hangs after a random (never the
same) number of iterations.

Given that the hangs are random, and don't appear to happen at the same
time step but a similar place in the code, suggests to me that something
may be amiss in the MPI_Waitsome function. Possible a completion was
posted and due to buffer sizes, fell off the scoreboard.

Any thoughts?

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman_at_[hidden]
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615