On Apr 12, 2009, at 21:58 , Timothy Hayes wrote:
> I was wondering if someone might be able to shed some light on a
> couple of questions I have.
> When you receive a fragment/base_descriptor in a BTL module, is the
> raw data allowed to be fragmented when you invoke the callback
> function? By that I mean, I'm using a circular buffer in each
> endpoint so sometimes data loops back around. Currently I'm doing a
> two step copy: from my socket to the circular buffer and then from
> the circular buffer to the fragment. This actually effects my total
> throughput quite a bit, it would be much nicer to just point to the
> buffer instead. When I tried using two base_segments to point to the
> start and end of buffer I got some pretty strange errors. I'm just
> wondering if someone could confirm or deny that you can or can't do
> this, maybe those errors were down to human error instead.
On the descriptor you can set a number of iovec containing the raw
data. You don't have to make it contiguous prior to calling up in the
PML. I think the PML header has to be contiguous, so you have to make
sure that the first 32 bytes of the message are contiguous.
> My other question is about the BTL failover system. Would someone
> be able to briefly explain how it works or maybe point me to some
> docs? I'm actually expecting the file descriptors in my module to
> fail a certain point during an Open MPI job and I'd like my BTL
> module to fail gracefully and allow the TCP module to take over in
> its place. I'm not sure how to explicitly make the the BTL module
> say to the rest of Open MPI "don't use my anymore" though.
There is no way to say don't use me "at all" anymore. This is per peer
based, so you will have to return an error on every peer. Try
returning OMPI_ERR_OUT_OF_RESOURCE from all functions that allocate
descriptors (_alloc, _prepare_src and _prepare_dst), and the PML will
end-up removing this BTL from the list.
> Happy Easter
> devel mailing list