When you receive that callback the MPI has ben put in a quiescent state. As
such it does not allow MPI communication until the checkpoint is completely
finished. So you cannot call barrier in the checkpoint callback. Since Open
MPI did doing a coordinated checkpoint, you can assume that all processes
are calling the same callback at about the same time (the coordination
algorithm synchronizes them for you)
If you would like a notification callback before the quiescence protocol
you might want to look at the INC callbacks:
They are available in the Open MPI trunk (v1.7). The
callback will give you immediate notice, and you -should- be able to make
MPI calls in that callback. I have not tried it, but conceptually it should
work. If it does not, I can file a bug ticket and we can look into
On Wed, Feb 15, 2012 at 4:23 AM, Faisal Shahzad <itsfaisi_at_[hidden]>wrote:
> Dear Group,
> I wanted to do a synchronization check with 'MPI_Barrier(MPI_COMM_WORLD)'
> in 'opal_crs_self_user_checkpoint(char **restart_cmd)' call. Although every
> process is present in this call, it fails to synchronize. Is there any
> reason why cant we use barrier?
> Thanks in advance.
> Kind regards,
> users mailing list
Postdoctoral Research Associate
Oak Ridge National Laboratory