Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Process Migration
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2011-11-10 09:10:02


The MPI standard does not provide explicit support for process
migration. However, some MPI implementations (including Open MPI) have
integrated such support based on checkpoint/restart functionality. For
more information about the checkpoint/restart process migration
functionality in Open MPI see the links below:
  http://osl.iu.edu/research/ft/ompi-cr/
  http://osl.iu.edu/research/ft/ompi-cr/tools.php#ompi-migrate

I even implemented an MPI Extensions API to this functionality so you
can call it from within your application:
  http://osl.iu.edu/research/ft/ompi-cr/api.php#api-cr_migrate

These pieces of functionality are currently only available in the Open
MPI development trunk.

-- Josh

On Thu, Nov 10, 2011 at 8:19 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Nov 10, 2011, at 8:11 AM, Mudassar Majeed wrote:
>
>> Thank you for your reply. I am implementing a load balancing function for MPI, that will balance the computation load and the communication both at a time. So my algorithm assumes that all the cores may at the end get different number of processes to run.
>
> Are you talking about over-subscribing cores?  I.e., putting more than 1 MPI process on each core?
>
> In general, that's not a good idea.
>
>> In the beginning (before that function will be called), each core will have equal number of processes. So I am thinking either to start more processes on each core (than needed) and run my function for load balancing and then block the remaining processes (on each core). In this way I will be able to achieve different number of processes per core.
>
> Open MPI spins aggressively looking for network progress.  For example, if you block in an MPI_RECV waiting for a message, Open MPI is actively banging on the CPU looking for network progress.  Because of this (and other reasons), you probably do not want to over-subscribe your processors (meaning: you probably don't want to put more than 1 process per core).
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey