Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] btl tcp port to xensocket
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-01-18 19:54:09

On Jan 17, 2008, at 7:08 PM, Muhammad Atif wrote:

> Thanks again. Nope.. at the moment I am doing the lame stuff i.e.
> simply changing the tcp code. So I have not created another btl
> component. I know its not recommended thing, but I just wanted to
> try before committing.

That makes perfect sense. Ok, so you're not running into a component
name collision within the modex; that's good.

> Apart from xensocket specific stuff, all what I have done inside the
> btl/tcp code is to change the structure
> struct mca_btl_tcp_addr_t {
> struct in_addr addr_inet; /**< IPv4 address in network byte
> order */
> in_port_t addr_port; /**< listen port */
> unsigned short addr_inuse; /**< local meaning only */
> int xs_domU_ref; /**<xs: domU memory reference */
> };
> I wanted this structure to be passed on to all peers through
> component exchange (modex send/recv). This way I have the normal
> socket listen port, its address and xensocket memory reference (its
> not complete as it is missing some other info, but lets stick to
> basic stuff).

Sounds reasonable.

> The second question is regarding btl tcp recv. I have seen a couple
> of emails with some explanation specific to that particular user but
> cannot seem to answer this question (ref to previous email).

> Second question is regarding the receive part of openmpi. In my
> understanding, once Recv api is called, the control goes through PML
> layer and everything initializes there. However, I am unable to get
> a lock at the layer/file/function where the receive socket polling
> is done. There are callbacks, but where or how exactly the openMPI
> knows that message has in fact arrived. Any pointer will do :)

All file descriptor process is handled by libevent down in opal.
libevent is a third party library that we imported into Open MPI (and
modified a bit) that handles generic fd issues. For example, we
register fd's with libevent and tell libevent that we want callbacks
when the fd is ready for reading or writing (depending on the context).

libevent's event loop is invoked by opal_progress(), which is called
in lots of places. Hence, the tcp btl can be called back whenever
opal_progress() is invoked, because opal_progress() will invoke
libevent, and if any socket fd's that the tcp btl registered are
reading for reading, or if there are pending writes occurred on some
socket fd's and those fd's are ready for writing, their callbacks will
be invoked.

Make sense?

> PS: I would love if you do some explanation of modex recv as well. ;)
> Thanks for all the support you guys are giving.

I think Adrian was referring to how the modex works. Remember that
the modex send is just a local memcpy; all the modex data is them
glommed up into a single network send communication later. After
that, it gets a big network message with *everyone's* modex data, that
is then split up and categorized by component and sender. The modex
receive is then another memcpy.

So as to why you're still getting sizeof(mca_btl_tcp_addr_t)==8 in the
tcp modex receiver, the only thing I can think of is that you somehow
didn't recompile properly. Did you try making clean in the tcp btl
dir and then a "make all" to ensure that everything recompiled
properly with your modified struct in btl_tcp_addr.h? Normally, the
build system should take care of such dependencies, but...

Jeff Squyres
Cisco Systems