Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Replacing poll()
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-03-03 19:54:53

On Mar 3, 2012, at 18:18 , Alex Margolin wrote:

> I've figured that what I really need is to write my own BTL component, rather then trying to manipulate the existing TCP one. I've started writing it using the 1.5.5rc3 tarball and some pdfs from 2006 I found on the website (anything else I can look at? TCP is much more complicated then what I'm writing). I think I'm getting the hang of it, but I still have some questions about terminology for the component implementation:
> The basic data structures for routing fragments are components, modules, interfaces and endpoints, right?

Are you trying to route fragments through intermediary nodes? If yes, then I might have a patch somewhere supporting routing for send/recv protocols.

> So, If I have 3 nodes, each with 2 interfaces (each having one constant IP), and i'm running 2 processes total. I'll have... 1 component, 2 modules, 4 interfaces (2 per module) and 4 addresses?
> What about "links" (as in "num_of_links" component struct member) - what does it count?

Number of socket to be opened per device. In some cases (as an example when there is a hypervisor) one single socket is not enough to use the device completely. If I remember correctly on the PS3 3 socket were needed to get the 900Mbs out of the 1Gb ethernet link.

> ompi_modex_send - Is it supposed to share the addresses of all the running processes before they start? suppose I assume one NIC per machine. Can I just send an array of mca_btl_tcp_addr_t, and every process will find the one belonging to him by some index (his rank?). I saw the ompi_modex_recv() call in _proc.c and it seems that every proc instance reads the entire sent buffer anyway.

Right, the modex is used to exchange the "business card" of each process.

> Sorry for flooding you all with questions, I hope I'm not way off here. I hope I'll finish writing something by the end of next week (I'm working on this after hours, not full time), with the purpose of submitting it as a contribution to open-mpi.

Looking forward to it.


> Appreciate your help so far,
> Alex
> On 03/02/2012 09:26 PM, Jeffrey Squyres wrote:
>> Give your btl progress function. It'll get called quite frequently.
>> Look at the "progress" section in btl.h. Progress threads don't work yet, but the btl_progress function will get called by the PML quite frequently. It's how BTL's like openib progress their outstanding message passing.
>> On Mar 2, 2012, at 2:22 PM, Alex Margolin wrote:
>>> On 03/02/2012 04:33 PM, Jeffrey Squyres wrote:
>>>> Note that the OMPI 1.4.x series is about to be retired. If you're doing new stuff, I'd advise you to be working with the Open MPI SVN trunk. In the trunk, we've changed how we build libevent, so if you're adding to it, you probably want to be working there for max forward-compatibility.
>>>> That being said:
>>>>> I know trying to replace poll() seems like I'm doing something very wrong, but I want to poll on events without a valid linux file descriptor (and existing events, specifically sockets, at the same time), and I see no other way. Obviously, my poll2 calls the linux poll in most cases.
>>>> What exactly are you trying to do? OMPI has some internal hooks for non-fd-or-event-based progress. Indeed, libevent is typically called with fairly low frequency (e.g., if you're running with OpenFabrics or some other high-speed/not-fd-based networking interconnect).
>>> I'm trying to create a new btl module. I've written an adapter from my library to TCP, so I've implemented socket/connect/accept/send/recv... now I've taken the TCP BTL module and cloned it - replacing the relevant calls with mine. My only problem is with poll, which is not in the MCA (at least in 1.4.x).
>>> I've implemented poll() and select() but it's not that good, because my events are not based on valid linux file descriptors, but I can poll all my events at the same time (but not in conjunction with real FDs, unfortunatly).
>>> Can you give me some pointers as to where to look in the MPI (1.5?) source code to implement it properly?
>>> Thanks,
>>> Alex
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]