Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] oob-tcp problem, unreachable in orted_comm
From: Åke Sandgren (ake.sandgren_at_[hidden])
Date: 2009-06-06 12:18:27


Just got this in a user job.
Any idea why it complains like this.
The original error was the infamous "RETRY EXCEEDED ERROR" but instead
of killing the job it showed this and never died.
I have never seen this happen before.

openmpi 1.3.2, built with intel 10.1
This binary is used ALOT (+50% of the system walltime) and has never
shown this specific problem and rarely the "Retry exceeded error"
either.

[p-bc2503.hpc2n.umu.se:11892] [[34820,0],0]-[[34820,0],1] oob-tcp:
Communication
 retries exceeded. Can not communicate with peer
[p-bc2503.hpc2n.umu.se:11892] [[34820,0],0] ORTE_ERROR_LOG: Unreachable
in file
orted/orted_comm.c at line 130
[p-bc2503.hpc2n.umu.se:11892] [[34820,0],0] ORTE_ERROR_LOG: Unreachable
in file
orted/orted_comm.c at line 130
[p-bc2503.hpc2n.umu.se:11892] [[34820,0],0]-[[34820,0],1] oob-tcp:
Communication
 retries exceeded. Can not communicate with peer

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake_at_[hidden]   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se