Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] I got "ssh_exchange_identification" errors when I mpirun over 1500 times almost at the same time
From: vacate (vacatehoping_at_[hidden])
Date: 2013-06-03 23:26:09


Dear Ralph Castain,

Thank you for you reply!!!

Actually, I have adjusted my /etc/security/limits.conf file,
I modified the "soft nofile" and "hard nofile" values up to 65535, so these
days I tried another possible limits settings
another settings include "soft memlock" ,"hard memlock", and
/proc/sys/fs/file-max file.
it still didn't work...

But at the last, I think my "soft nofile" and "hard nofile" values may be
not large enough.
After I arise those value, it works finally !!!!!! lol

Thank you for your suggestion again!!! It's very useful!!! :))))

On Sun, Jun 2, 2013 at 10:38 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I would suspect you are hitting limits on the number of open sockets -
> check your limits settings on file descriptors.
>
> On Jun 1, 2013, at 11:43 AM, vacate <vacatehoping_at_[hidden]> wrote:
>
> Hello everybody,
>
> I'm a beginner in Open MPI world.
> Maybe it's a simple problem, but I cannot figure out what happen on it...
>
> My situation is
> I use 4 hosts totally, and their IP address are static.
> I can't do *mpirun* over 1500 times almost at the same time.
> (but it's always okay less than 1000 times)
> I got many "*ssh_exchange_identification: connection closed by remote host
> *" errors.
>
>
> --------------------------------------------------------------------------------------------------------------------------
> My Open MPI version : 1.6.2
>
> --------------------------------------------------------------------------------------------------------------------------
> I use a simple bash shell script file to run my Open MPI file(named
> openMPI_test)
> Here is my script content :
>
> for (( index=0; index<2000 ; index++))
> do
> (time mpirun --hostfile my_hostfile openMPI_test &) >> file 2>&1
> done
>
>
> p.s.1 "my_hostfile" file lists 4 hosts' IP address.
> p.s.2 "openMPI_test" file ask each host do the same thing, it means:
> if(rank == 0){ for(i=0 ; i<65535 ; i++) temp = i/(i+1);
> }
> else if(rank == 1){ for(i=0 ; i<65535 ; i++) temp =
> i/(i+1); }
> else if(rank == 2){ for(i=0 ; i<65535 ; i++) temp =
> i/(i+1); }
> else if(rank == 3){ for(i=0 ; i<65535 ; i++) temp =
> i/(i+1); }
>
> --------------------------------------------------------------------------------------------------------------------------
>
> At the first, I thought I have some system problems,
> so I tried to modify my /etc/ssh/sshd_config file.
> I set Max_Sessions up to 65535, and MaxStartups up to 65535,
> but the result made me so sad because it still didn't work :((
>
> I'm not sure if there are some settings or limits in Open MPI,
> or I just have another system problems?
>
> I really hope someone can help me..
> Thank you all very very much!!
>
>
>
> Best Wishes,
> Jen
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>