Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Changing BTLs at runtime
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2010-03-29 12:24:30

This line of work sounds interesting. Just wanted to add my 2 cents on
one point below.

On Mar 26, 2010, at 9:46 AM, Christoph Konersmann wrote:

>>> The Background:
>>> I should give some background, why I'm implementing this. Changing
>>> the
>>> MPI communication from a high speed network to a network with
>>> flowcontrol (openib->tcp) is necessary for checkpointing distributed
>>> applications in virtual machines. Ok, you are able to checkpoint
>>> through
>>> the FT-Framework and BLCR in Open MPI, but virtual machines already
>>> provide trivial functions for checkpointing. As you are not able to
>>> checkpoint the hardware information of e.g. openib you have to get
>>> rid
>>> of it in case of a checkpoint, and change back again on resume/
>>> continue.
>> I'm not quite sure I understand. I can see how the original model
>> of CRS and SNAPC don't quite fit that of VM's, but I don't quite
>> understand what switching openib -> tcp and then later tcp ->
>> openib gives you...?
>> Can't you just quiesce the openib BTL, let the VM checkpoint, and
>> then resume with openib? (or whatever other non TCP/sm BTL you want)
> I worked under the assumption that a virtualization might support
> InfiniBand through SR-IOV. So every virtual machine has the
> possibility to use it at full speed. Just starving out the
> communication between InfiniBand devices would not help in case of
> migration when the underlying hardware and its configuration would
> change. Therefore I have to unload the desired BTL module. To make
> sure that absolutely no bml uses infiniband for transfer anymore, I
> change the communication to another device whose protocol is known
> to work with migrating virtual machines, like tcp.

A few papers have pointed out the difficulties of support InfiniBand
in a virtualization environment where migration is a wanted feature.
Most solutions involve shutting down the InfiniBand network, moving
the process, then restarting the communication. It's a neat idea to
shift the network load to the TCP network to allow the application to
continue communication (though at diminished performance) during the
migration to work around the InfiniBand issue.

> Checkpointing would work with just quiesce the communication if the
> infiniband hardware will not changed.

Just wanted to mention that in Open MPI we have the ability to choose
a new set of BTLs on restart in our current C/R infrastructure. So we
can checkpoint process A which was communicating with process B over
'openib', and then restart them on the same machine and have them
transparently switch to 'sm'. Then we can move them apart and have
them pick another set of BTLs for communication (either 'tcp' or back
to 'openib' or something else entirely like 'mx').

-- Josh

> Kind regards,
> Christoph Konersmann
> --
> Paderborn Center for Parallel Computing - PC2
> University of Paderborn - Germany
> Christoph Konersmann <c_k_at_[hidden]>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]