It has nothing to do with OMPI - this is an ssh issue. I suspect you are simply overwhelming the connection system.

Maybe you could tell us what you are actually trying to accomplish - running thousands of mpiruns in parallel seems a tad extreme.

On Jun 4, 2013, at 9:48 AM, vacate <> wrote:

Hello everyone,

After solving my first ssh_exchange_identification problem, 
I feel embarrassed to ask my another problem... :'((

I got some "ssh: connect to host XXX.XXX.XXX.XX port 22: connection timed out" errors 
when I mpirun over 2000 times almost at the same time.
my bash shell script file :
   for (( index=0; index<2000 ; index++))
          (time mpirun --hostfile my_hostfile openMPI_test &) >> file 2>&1

But not "always" got this problem, just "often".(It seldom works well.)
In addition, the amount of "timed out" error in each test are different.
(In 2000 times, this error happened between 0~200 times)

I try to google it,
but I can't find anyone have this ssh problem when he/she use a lot of ssh connections...
So I think maybe someone here have had the same problem as mine.


The following are some of my settings that I have tried to change :

1. net.ipv4.tcp_fin_timeout=180

2. sudo iptables -A INPUT -p tcp --dport ssh -j ACCEPT

but these changes still didn't solve my problem... 

I still can't figure out where is the problem and are there some potential problems :(((

If someone here have any idea about this situation ,or have had the same problem as mine?
Is it my machine problem or system problem? Or OpenMPI can't let me do something like this?

Really hope someone can give me a hand ..
Thank you all very very very much!!

Best Wishes,

users mailing list