Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-04-20 06:53:11


On Apr 19, 2007, at 11:27 PM, Babu Bhai wrote:

> I have already seen this faq. Nodes in cluster does not have
> multiple IP addresses. One thing i forgot to mention is that
> systems in cluster does not have static IPs and get IP address
> through DHCP.

Ok, that should be fine.

> Also if there is a print statement (printf("hello world\n"); ) in
> slave it is correctly printed on masters consoles but none of MPI
> commands work.

I'm not sure I follow -- which MPI commands are you referring to,
mpirun? Something else?

I think you're saying that the MPI job starts up, printf works fine,
but then something goes bad...? Are you saying that MPI *functions*
don't seem to work (like MPI_SEND)? (I'm a little confused by your
use of the word "command")

If that is the case, then this is a bit more odd because it means
that OMPI started up, launched your job, and did some "out of band"
communication, but then failed the first time it tried to establish
MPI communications.

Are you running any firewall or port-blocking software on either of
the nodes? Is each node routable from the other? (in Linux, at
least, errno 113 is "no route to host", which would tend to imply
that one host could not open a socket to another because it couldn't
route there)

> regards,
>
> Abhishek
>
> >I need to make that error string be google-able -- I'll add it to the
> >faq. :-)
>
> >The problem is likely that you have multiple IP addresses, some of
> >which are not routable to each other (but fail OMPI's routability
> >assumptions). Check out these FAQ entries:
>
> >http://www.open-mpi.org/faq/?category=tcp#tcp-routability
> >http://www.open-mpi.org/faq/?category=tcp#tcp-selection
>
> >Does this help?
>
> >On Apr 19, 2007, at 11:07 AM, Babu Bhai wrote:
>
> >> I have migrated from LAM/MPI to OpenMPI. I am not able to
> >> execute simple mpi code in which master sends an integer to slave.
> >> If i execute code on single machine i.e start 2 instance on same
> >> machine (mpirun -np 2 hello) this works fine.
> >>
> >> If i execute in cluster using mpirun --prefix /usr /local -
> >> np 2 --host 199.63.34.154,199.63.34.36 hello
> >> it gives following error "btl_tcp_endpoint.c:
> >> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
> >> errno=113"
> >>
> > >I am using openmpi-1.2
> >>
> > >regards,
> > >Abhishek
> > >_______________________________________________
> > >users mailing list
> > >users_at_[hidden]
> > >http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> >--
> >Jeff Squyres
> >Cisco Systems
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems