Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpirun hang up randomly
From: Harichand M V (harichandmv_at_[hidden])
Date: 2010-07-09 01:25:11


Hi,

I am getting hang ups in mpi job randomly.

..............
...........
  IT:20760 CF: 0.7743 Time: 1540.0 MaxMin:20.69/5 :20.12/12
  IT:20770 CF: 0.7734 Time: 1560.2 MaxMin:20.50/1 :19.31/5
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 9399 on node node1 exited on
signal 1 (Hangup).
--------------------------------------------------------------------------
[node1:09356] filem:rsh: close()
[node1:09356] mca: base: close: component rsh closed
[node1:09356] mca: base: close: unloading component rsh
[node1:09356] mca: base: close: component default closed
[node1:09356] mca: base: close: unloading component default
[node1:09356] mca: base: close: component hnp closed
[node1:09356] mca: base: close: unloading component hnp
[node1:09356] mca: base: close: component round_robin closed
[node1:09356] mca: base: close: unloading component round_robin
[node1:09356] mca: base: close: component rsh closed
[node1:09356] mca: base: close: unloading component rsh
[node1:09356] mca: base: close: component default closed
[node1:09356] mca: base: close: unloading component default
[node1:09356] mca: base: close: component bad closed
[node1:09356] mca: base: close: unloading component bad
[node1:09356] mca: base: close: unloading component binomial
[node1:09356] mca: base: close: component tcp closed
[node1:09356] mca: base: close: unloading component tcp
[node1:09356] mca: base: close: component oob closed
[node1:09356] mca: base: close: unloading component oob
[node1:09356] mca: base: close: unloading component auto_detect
[node1:09356] mca: base: close: unloading component linux

I am using open mpi version 1.2.7 over infiniband.
I was running the application over 15 nodes.

job is started using nohup to run it in back ground.

Thanks in advance
Harichand M V