Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] xensocket btl and migration
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-03-15 08:37:55


On Mar 9, 2008, at 6:13 AM, Muhammad Atif wrote:

> Okay guys.. with all your support and help in understanding ompi
> architecture, I was able to get Xensocket to work. Only minor
> changes to the xensocket kernel module made it compatible with
> libevent. I am getting results which are bad but I am sure, I have
> to cleanup the code. At least my results have improved over native
> netfront-netback of xen for messages of size larger than 1 MB.

Great! Be aware that we are in the process of updating the version of
libevent that is included in Open MPI. As part of this process, we
are re-enabling the more scalable fd-monitoring mechanisms (such as
epoll and friends). Do you know if xensockets play nicely with epoll?

> I started with making minor changes in the TCP btl, but it seems it
> is not the best way, as changes are quite huge and it is better to
> have separate dedicated btl for xensockets. As you guys might be
> aware Xen supports live migration, now I have one stupid question.
> My knowledge so far suggests that btl component is initialized only
> once.

Correct.

> The scerario here is if my guest os is migrated from one physical
> node to another, and realizes that the communicating processes are
> now on one physical host and they should abandon use of TCP btl and
> make use of Xensocket btl. I am sure it would not happen out of the
> box, but is it possible without making heavy changes in the openmpi
> architecture?
> With the current design, i am running a mix of tcp and xensocket
> btls, and endpoints check periodically if they are on same physical
> host or not. This has quite a big penalty in terms of time.

Josh Hursey has been doing much of the checkpoint/restart and
migration work -- I'll let him answer this...

> Another question is (good thing i am using email otherwise you guys
> would beat the hell outta me, its such a basic question). I am not
> able to track MPI_Recv(...) api call and its alike calls. Once in
> the code of MPI_Recv(..) we give a call to rc =
> MCA_PML_CALL(recv(buf, count ... ). This call goes to the macro, and
> pml.recv(..) gets invoked (mca_pml_base_module_recv_fn_t
> pml_recv;) . Where can I find the actual function? I get totally
> lost when trying to pinpoint what exactly is happening. Basically, I
> am looking for a place where tcp btl recv is getting called with all
> the goodies and parameters which were passed by the MPI programmer.
> I hope I have made my question understandable.

Sorry about all the function pointers -- it's how we have to do this
because of all the plugins...

In the OB1 case, it goes to mca_pml_ob1_recv() (and
mca_pml_ob1_irecv() for the non-blocking case). See ompi/mca/pml/ob1/
pml_ob1.c for a big function table that is passed back out of the OB1
module. This patter is repeated for most/all components in OMPI --
when the component is initialized, it passes back a table of function
pointers for its module that the upper-level code can call.

-- 
Jeff Squyres
Cisco Systems