Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problems with MPI_Waitsome/MPI_Allstart and OpenMPI on gigabit and IB networks
From: Joe Landman (landman_at_[hidden])
Date: 2008-07-20 10:58:00


Joe Landman wrote:

>
> 3) using btl to turn off sm and openib, generates lots of these messages:
>
> [c1-8][0,1,4][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
> connect() failed with errno=113

[...]

> No route to host at -e line 1.
>
> This is wrong, all the nodes are visible from all the other nodes on a
> private subnet. For example:

ok, fixed this. Turns out we have ipoib going, and one adapter needed
to be brought down and back up. Now the tcp version appears to be
running, though I do get the strange hangs after a random (never the
same) number of iterations.

Given that the hangs are random, and don't appear to happen at the same
time step but a similar place in the code, suggests to me that something
may be amiss in the MPI_Waitsome function. Possible a completion was
posted and due to buffer sizes, fell off the scoreboard.

Any thoughts?

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman_at_[hidden]
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615