Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] error in ompi-checkpoint
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-09-23 09:16:55

How did you configure Open MPI? Is your application using SIGUSR1?

This error message indicates that Open MPI's daemons could not
communicate with the application processes. The daemons send SIGUSR1
to the process to initiate the handshake (you can change this signal
with -mca opal_cr_signal). If your application does not respond to the
daemon within a time bound (default 20 sec, though you can change it
with -mca snapc_full_max_wait_time) then this error is printed, and
the checkpoint is aborted.

-- Josh

On Sep 22, 2009, at 1:43 AM, Mallikarjuna Shastry wrote:

> <error.txt>_______________________________________________
> users mailing list
> users_at_[hidden]