Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] mpirun hangs
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-01-05 19:19:30


On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote:

> Interesting though. I thought in such a simple scenario shared
> memory would be used for IPC (or whatever's fastest) . But nope.
> Even with one process still it wants to use TCP/IP to communicate
> between mpirun and orted.

Correct -- we only have TCP enabled for MPI process <--> orted
communication. There are several reasons why; the simplest is that
this is our "out of band" channel and it is only used to setup and
tear down the job. As such, we don't care that it's a little slower
than other possible channels (such as sm). MPI traffic will use
shmem, OpenFabrics-based networks, Myrinet, ...etc. But not MPI
process <--> orted communication.

> What's even more surprising to me it won't use loopback for that.
> Hence my maybe a little bit over-restrictive iptables rules were the
> problem. I allowed traffic from 127.0.0.1 to 127.0.0.1 on lo but not
> from <eth0_addr> to <eth0_addr> on eth0 and both processes ended up
> waiting for IO.
>
> Can I somehow configure it to use something other than TCP/IP here?
> Or at least switch it to loopback?

I don't remember how it works in the v1.2 series offhand; I think it's
different in the v1.3 series (where all MPI processes *only* talk to
the local orted, vs. MPI processes making direct TCP connections back
to mpirun and any other MPI process with which it needs to bootstrap
other communication channels). I'm *guessing* that the MPI process <--
> orted communication either uses a named unix socket or TCP
loopback. Ralph -- can you explain the details?

-- 
Jeff Squyres
Cisco Systems