Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] Re: RFC: ob1: fallback on put/send on rget failure
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2012-03-19 09:44:32


I'm not sure I'm the best one to comment on OB1 these days, but I didn't
see anything obviously wrong.

Brian

On 3/19/12 9:32 AM, "Jeffrey Squyres" <jsquyres_at_[hidden]> wrote:

>George / Brian --
>
>Can you guys comment on this patch?
>
>
>On Mar 15, 2012, at 5:07 PM, Nathan Hjelm wrote:
>
>> What: Update ob1 to do the following:
>> - fallback on send after rdma_put_retries_limit failures of
>>prepare_dst
>> - fallback on put (single non-pipelined) if the btl returns
>>OMPI_ERR_NOT_AVAILABLE on a get transaction.
>>
>> When: Timeout in about one week (Mar 22)
>>
>> Why: Two reasons:
>> - Some btls (ugni) need to switch to put for certain
>>transactions. It makes sense to make this switch at the pml level.
>> - If prepare_dst repeatedly fails for a get transaction we
>>currently deadlock. We can avoid the deadlock (in most cases) by
>>switching to send for the transaction.
>>
>> Please take a look at the attached patch. Feedback and constructive
>>criticism is needed!
>>
>> -Nathan Hjelm
>> HPC-3,
>>LANL<ompi_trunk_ob1_get_fallback.patch.gz>_______________________________
>>________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>--
>Jeff Squyres
>jsquyres_at_[hidden]
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories