Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OPEN MPI with self
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-08-19 10:37:51


On Aug 18, 2009, at 11:36 AM, Jean Potsam wrote:

> Dear ALL,
> I am trying to checkpoint MPI application using the
> self component. I had a look at the OPEN MPI FT user's guide Draft
> 1.4. but is still unsure.
>
> I have installed openmpi as follows:
>
> jean$ ./configure --prefix=/home/jean/openmpi/ --enable-debug --
> enable-mpi-profile --enable-mpi-cxx --enable-binaries --enable-
> trace --enable-static=yes --enable-debug --with-devel-headers=1 --
> with-mpi-param-check=always --with-ft=cr --enable-ft-thread --
> enable-mpi-threads=yes
>
> jean$ make all install
>
> MY questions are:
>
> Q1) Have I properly configured openmpi with self?

Yes it looks like you have configured correctly. To double check you
can look at the config.log file in the build directory, and look for
the following lines (it should say 'yes'):
----------------
configure:87103: checking if MCA component crs:self can compile
configure:87105: result: yes
----------------

I recently fixed a number of bugs with the 'self' CRS functionality.
So you will want to make sure you are using a recent version of
either the development trunk (anything after r21777) or the v1.3
branch (anything after r21798).

>
> In the document, it is said:
> "To be absolutely clear: these functions are to be provided by the
> application - they are not included in the open mpi library"
>
> q2) Does this means that i will have to write my own checkpoint,
> continue and restart functions and fucntion calls?

The 'self' checkpointer requires the application to write its own
checkpoint, continue, and restart functions. These functions must
have a precise signature since they are called by Open MPI. In
particular they need to look like:
   int opal_crs_self_user_checkpoint(char **restart_cmd);
   int opal_crs_self_user_continue(void);
   int opal_crs_self_user_restart(void);

The 'crs_self_prefix' MCA parameter will allow you to customize the
function names a bit. For example:
   shell$ mpirun -np 2 -am ft-enable-cr -mca crs_self_prefix
my_personal my-app

Will cause Open MPI to look for functions with the following signature:
   int my_personal_checkpoint(char **restart_cmd);
   int my_personal_continue(void);
   int my_personal_restart(void);

>
> Q3) has anyone experienced with self checkpointing? I would really
> appreaciate if a guide could be available.

The C/R FT User's Guide is the only guide that I know of out there. I
attached a sample program that takes advantage of the 'self' CRS system.

To compile:
   mpicc personal-cr.c -export -export-dynamic -o personal-cr

To run with default function names:
   shell$ mpirun -np 2 -am ft-enable-cr personal-cr

To run with custom function names:
   shell$ mpirun -np 2 -am ft-enable-cr -mca crs_self_prefix
my_personal personal-cr

-- Josh


>
> Thanks a lot
>
> cheers
>
> JEan
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users