I have experimented a bit more and found that if I set
OMPI_MCA_plm_rsh_num_concurrent=1024
a job with more than 2,500 processes will start and run.
However when I searched the open-mpi web site for the the variable I could not
find any indication.
Best wishes,
Lydia Heck
> 15. jobs with more that 2, 500 processes will not even start
> (Lydia Heck)
>
> ------------------------------
>
> Message: 15
> Date: Tue, 14 Dec 2010 16:10:01 +0000 (GMT)
> From: Lydia Heck <lydia.heck_at_[hidden]>
> Subject: [OMPI users] jobs with more that 2, 500 processes will not
> even start
> To: users_at_[hidden]
> Message-ID:
> <alpine.LRH.2.00.1012141549220.20537_at_[hidden]>
> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
>
>
> About 9 months ago we had a new installation with a system of 1800 cores and at
> the time we found that jobs with more than 1028 cores would not start. At the
> time a colleague found that setting
>
> OMPI_MCA_plm_rsh_num_concurrent=256
>
> help with the problem.
>
> We have now increased our processor count to more than 2700 cores and a job with
> 2,500 jobs does not start.
>
> Is there any advice?
>
> Best wishes,
>
> Lydia Heck
>
|