Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-12-15 15:03:59

On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote:

> Bonsoir Ralph,
> Le 15/12/2010 18:45, Ralph Castain a écrit :
>> It looks like all the messages are flowing within a single job (all three processes mentioned in the error have the same identifier). Only possibility I can think of is that somehow you are reusing ports - is it possible your system doesn't have enough ports to support all the procs?
> Seems there is on every worker node a range of almost 30k ports available:
> > ssh r33i0n0 cat /proc/sys/net/ipv4/ip_local_port_range
> 32768 61000
> This is AFAIK the only way I can get info about this.
> Are these 30k ports this enough ?

Depends on how many nodes there are in your system.

> Question is : is OpenMPI opening ports from every node towards every other node ?
> In such a case I could figure out why it is going to to lacking ports when
> I increase the number of nodes.

Yes - in two ways:

1. each ORTE daemon opens a port to every other daemon in the system. Thus, you need at least M ports if your job is running across M nodes

2. each MPI process will open a direct port to any other MPI process that it communicates with. So if you have N processes on a node, and they only communicate to the 8 nearest neighbor nodes (each of which have N processes), and you are using the TCP btl, then you will consume an additional 8*N*N sockets on each node.

> But: is there a possibility (mca param ?) to prevent OpenMPI to open so many ports ?
> Indeed, apart from rank 0 node, every MPI process will need to communicate with ONLY
> the 8 (nearest) neighbour nodes. So, there should be a switch somewhere telling OpenMPI
> to open a port ONLY when needed, but I did not find it among ompi_info stuff ;-)

It always only opens a port when something tries to communicate - we never open ports in advance.

> Which one is it ?
> Thanks, Best, G.