Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2010-12-16 03:29:50


Bonjour Jeff,

Le 16/12/2010 01:40, Jeff Squyres a écrit :
> On Dec 15, 2010, at 3:24 PM, Ralph Castain wrote:
>
>>> I am not using the TCP BTL, only OPENIB one. Does this change the number of sockets in use per node, please ?
>> I believe the openib btl opens sockets for connection purposes, so the count is likely the same. An IB person can confirm that...
> Nope -- the openib BTL uses the daemon-based communication mechanism. So it should only use the TCP ports that are already open.
>
> Does this problem *always* happen, or does it only happen once in a great while?
gg= No, this problem happens rather often, almost every other time.
Seems to happen more often as the number of cores increases.

> I've seen a similar problem with the TCP BTL every once in a great while -- where a random, errant (non-Open MPI) process connects to a socket that Open MPI is listening on (regardless of whether it's the TCP BTL or TCP OOB). This causes badness in Open MPI because we don't verify the connector properly, and more importantly, don't handle it nicely when the connector is not Open MPI. I've seen this happen with network malware scanners, for example -- they try to connect to large swaths of TCP ports and sometimes unluckily hit an open Open MPI TCP port.
gg= Is there a way with the *current* code, to direct OpenMPI to use a
restricted range of TCP ports,
that I can choose at launch time ?
Or, conversely, which routine should I patch in my private OpenMPI
install to aim at the same result ?

  Usually, on the cluster I use, workers are not shared with any other
tasks ... ;;-((

  Thanks, Best, G.

> We have a fix for this coming in a future version of the TCP BTL; looks like we should also harden this up for the TCP OOB as well...