Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-12-15 15:03:59

On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote:

> Bonsoir Ralph,
> Le 15/12/2010 18:45, Ralph Castain a écrit :
>> It looks like all the messages are flowing within a single job (all three processes mentioned in the error have the same identifier). Only possibility I can think of is that somehow you are reusing ports - is it possible your system doesn't have enough ports to support all the procs?
> Seems there is on every worker node a range of almost 30k ports available:
> > ssh r33i0n0 cat /proc/sys/net/ipv4/ip_local_port_range
> 32768 61000
> This is AFAIK the only way I can get info about this.
> Are these 30k ports this enough ?

Depends on how many nodes there are in your system.

> Question is : is OpenMPI opening ports from every node towards every other node ?
> In such a case I could figure out why it is going to to lacking ports when
> I increase the number of nodes.

Yes - in two ways:

1. each ORTE daemon opens a port to every other daemon in the system. Thus, you need at least M ports if your job is running across M nodes

2. each MPI process will open a direct port to any other MPI process that it communicates with. So if you have N processes on a node, and they only communicate to the 8 nearest neighbor nodes (each of which have N processes), and you are using the TCP btl, then you will consume an additional 8*N*N sockets on each node.

> But: is there a possibility (mca param ?) to prevent OpenMPI to open so many ports ?
> Indeed, apart from rank 0 node, every MPI process will need to communicate with ONLY
> the 8 (nearest) neighbour nodes. So, there should be a switch somewhere telling OpenMPI
> to open a port ONLY when needed, but I did not find it among ompi_info stuff ;-)

It always only opens a port when something tries to communicate - we never open ports in advance.

> Which one is it ?
> Thanks, Best, G.