Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Summary of the problem with r26626
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2012-07-12 12:50:55

After some digging Terry and I discovered the problem with r26626. To perform an rdma transaction pmls used to explicitly promote the seg_addr from prepare_src/dst to 64-bits before sending it over the wire. The other end would then (inconsistently) use the lval to perform the get/put. Segments are now opaque objects so the pmls simply memcpy the segments into the rdma header (without promoting seg_addr). So, right now we have a mixture of lvals and pvals in the put and get paths which will not work in two cases: 32-bit bit, and mixed 32/64-bit environments.

I can think of a few ways to fix this:

 - Require the pmls to explicitly promote seg_addr to 64-bits after the memcpy. This is a band aid fix but I can implement/commit it very quickly (this will work fine until a more permanent solution is found).
 - Require prepare_src/dst to return segments with 64-bit addresses for all rdma fragments (0 == reserve). This is relatively simple for most btls but a little more complicated for openib. The openib btl may pack data for a get/put into a send segment. The obvious way to handle this case is to set the lval in prepare_src and restore the pval when the send fragment is returned.
 - Change the btl interface in a way that allows the btl to prepare segments specifically to be sent to another machine. This is a bit more complicated and would require lots of discussion and an RFC.

I am open to suggestions.