Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2010-12-15 14:30:39

Bonsoir Ralph,

Le 15/12/2010 18:45, Ralph Castain a écrit :
> It looks like all the messages are flowing within a single job (all
> three processes mentioned in the error have the same identifier). Only
> possibility I can think of is that somehow you are reusing ports - is
> it possible your system doesn't have enough ports to support all the
> procs?
  Seems there is on every worker node a range of almost 30k ports available:
> ssh r33i0n0 cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000

  This is AFAIK the only way I can get info about this.
Are these 30k ports this enough ?

  Question is : is OpenMPI opening ports from every node towards every
other node ?
In such a case I could figure out why it is going to to lacking ports when
I increase the number of nodes.

  But: is there a possibility (mca param ?) to prevent OpenMPI to open
so many ports ?
Indeed, apart from rank 0 node, every MPI process will need to
communicate with ONLY
the 8 (nearest) neighbour nodes. So, there should be a switch somewhere
telling OpenMPI
to open a port ONLY when needed, but I did not find it among ompi_info
stuff ;-)
Which one is it ?

  Thanks, Best, G.