Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] ompi-restart issue : ompi-restart doesn't work across nodes
From: arun dhakne (arundhakne_at_[hidden])
Date: 2008-09-30 00:52:48


Hi all,

I had gone through some previous ompi-restart issues but i couldn't
find anything similar to this problem.

I have installed blcr, and configured open-mpi 'openmpi-1.3a1r19645'

i) If the sample mpi program say ( np 4 on single machine that is
without any hostfile )is ran and I try to checkpoint it, it happens
successfully and even ompi-restart works in this case.

ii) If the sample mpi program is ran across say 2 different nodes and
checkpoint happens successfully BUT ompi-restart throws following
error:

[audhakne_at_acl-cadi-pentd-1 ~]$ ompi-restart ompi_global_snapshot_7604.ckpt
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 9590 on node
acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
fault).
--------------------------------------------------------------------------

Please let me know if more information is needed.

-- 
Thanks and Regards,
Arun U. Dhakne