Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Heywood, Todd (heywood_at_[hidden])
Date: 2007-02-02 13:41:20


I have OpenMPI running fine for a small/medium number of tasks (simple
hello or cpi program). But when I try 700 or 800 tasks, it hangs,
apparently on startup. I think this might be related to LDAP, since if I
try to log into my account while the job is hung, I get told my username
doesn't exist. However, I tried adding -debug to the mpirun, and got the
same sequence of output as for successful smaller runs, until it hung
again. So I added --debug-daemons and got this (with an exit, i.e. no
hanging):

...

[blade1:31733] [0,0,0] wrote setup file

------------------------------------------------------------------------

--
The rsh launcher has been given a number of 128 concurrent daemons to
launch and is in a debug-daemons option. However, the total number of
daemons to launch (200) is greater than this value. This is a scenario
that
will cause the system to deadlock.
 
To avoid deadlock, either increase the number of concurrent daemons, or
remove the debug-daemons flag.
------------------------------------------------------------------------
--
[blade1:31733] [0,0,0] ORTE_ERROR_LOG: Fatal in file
../../../../../orte/mca/rmgr/urm/
rmgr_urm.c at line 455
[blade1:31733] mpirun: spawn failed with errno=-6
[blade1:31733] sess_dir_finalize: proc session dir not empty - leaving
 
Any ideas or suggestions appreciated.
 
Todd Heywood