Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] ORTE_ERROR_LOG timeout
From: Alastair Basden (a.g.basden_at_[hidden])
Date: 2008-07-08 10:55:08


Hi,
I've got some code that uses openmpi, and sometimes, it crashes, after
printing somthing like:

[mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1166
[mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line
90
mpirun noticed that job rank 1 with PID 9658 on node mac1 exited on signal
6 (Aborted).
2 additional processes aborted (not shown)
[mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[mac1:09654] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned
value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------

In this case, all processes were running on the same machine, so its not a
connection problem. Is this a bug, or something else wrong? Is there a
way to increase the timeout time?

Thanks...