Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] orte-checkpoint hangs
From: Addepalli, Srirangam V (srirangam.v.addepalli_at_[hidden])
Date: 2010-02-10 12:45:11

I am trying to test orte-checkpoint with a MPI JOB. It how ever hangs for all jobs. This is how i submit the job is started
mpirun -np 8 -mca ft-enable cr /apps/nwchem-5.1.1/bin/LINUX64/nwchem siosi6.nw
>From another terminal i try the orte-checkpoint

ompi-checkpoint -v --term 9338
[compute-19-12.local:09377] orte_checkpoint: Checkpointing...
[compute-19-12.local:09377] PID 9338
[compute-19-12.local:09377] Connected to Mpirun [[5009,0],0]
[compute-19-12.local:09377] Terminating after checkpoint
[compute-19-12.local:09377] orte_checkpoint: notify_hnp: Contact Head Node Process PID 9338
[compute-19-12.local:09377] orte_checkpoint: notify_hnp: Requested a checkpoint of jobid [INVALID]
[compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Receive a command message.
[compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Status Update.

Is there any way to debug the issue to get more information or log messages.