Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] A Couple of Questions
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2009-04-13 10:05:15

On 04/13/09 09:40, George Bosilca wrote:
> On Apr 12, 2009, at 21:58 , Timothy Hayes wrote:
>> I was wondering if someone might be able to shed some light on a
>> couple of questions I have.
>> When you receive a fragment/base_descriptor in a BTL module, is the
>> raw data allowed to be fragmented when you invoke the callback
>> function? By that I mean, I'm using a circular buffer in each endpoint
>> so sometimes data loops back around. Currently I'm doing a two step
>> copy: from my socket to the circular buffer and then from the circular
>> buffer to the fragment. This actually effects my total throughput
>> quite a bit, it would be much nicer to just point to the buffer
>> instead. When I tried using two base_segments to point to the start
>> and end of buffer I got some pretty strange errors. I'm just wondering
>> if someone could confirm or deny that you can or can't do this, maybe
>> those errors were down to human error instead.
> On the descriptor you can set a number of iovec containing the raw data.
> You don't have to make it contiguous prior to calling up in the PML. I
> think the PML header has to be contiguous, so you have to make sure that
> the first 32 bytes of the message are contiguous.
>> My other question is about the BTL failover system. Would someone be
>> able to briefly explain how it works or maybe point me to some docs?
>> I'm actually expecting the file descriptors in my module to fail a
>> certain point during an Open MPI job and I'd like my BTL module to
>> fail gracefully and allow the TCP module to take over in its place.
>> I'm not sure how to explicitly make the the BTL module say to the rest
>> of Open MPI "don't use my anymore" though.
> There is no way to say don't use me "at all" anymore. This is per peer
> based, so you will have to return an error on every peer. Try returning
> OMPI_ERR_OUT_OF_RESOURCE from all functions that allocate descriptors
> (_alloc, _prepare_src and _prepare_dst), and the PML will end-up
> removing this BTL from the list.
> george.

We also looking at mapping out a BTL when we get an error. We are going
down the path of looking at registering a PML OB1 callback function that
gets invoked when we get an error in the BTL. Then this PML OB1
callback function can map out the BTL via a call to
mca_bml.bml_del_btl(btl) which seems to be doing the right thing.

But, to make this all work requires changes to the PML OB1 layer.

We are also figuring out what we do for retransmission when we get an error.