Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun hangs
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-01-05 19:19:30


On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote:

> Interesting though. I thought in such a simple scenario shared
> memory would be used for IPC (or whatever's fastest) . But nope.
> Even with one process still it wants to use TCP/IP to communicate
> between mpirun and orted.

Correct -- we only have TCP enabled for MPI process <--> orted
communication. There are several reasons why; the simplest is that
this is our "out of band" channel and it is only used to setup and
tear down the job. As such, we don't care that it's a little slower
than other possible channels (such as sm). MPI traffic will use
shmem, OpenFabrics-based networks, Myrinet, ...etc. But not MPI
process <--> orted communication.

> What's even more surprising to me it won't use loopback for that.
> Hence my maybe a little bit over-restrictive iptables rules were the
> problem. I allowed traffic from 127.0.0.1 to 127.0.0.1 on lo but not
> from <eth0_addr> to <eth0_addr> on eth0 and both processes ended up
> waiting for IO.
>
> Can I somehow configure it to use something other than TCP/IP here?
> Or at least switch it to loopback?

I don't remember how it works in the v1.2 series offhand; I think it's
different in the v1.3 series (where all MPI processes *only* talk to
the local orted, vs. MPI processes making direct TCP connections back
to mpirun and any other MPI process with which it needs to bootstrap
other communication channels). I'm *guessing* that the MPI process <--
> orted communication either uses a named unix socket or TCP
loopback. Ralph -- can you explain the details?

-- 
Jeff Squyres
Cisco Systems