On May 31, 2007, at 7:25 PM, Ralph Campbell wrote:
> I can run the Intel MPI benchmarks OK at np=2 but at np=4,
> it hangs.
> If I change /usr/share/openmpi/mca-btl-openib-hca-params.ini
> [QLogic InfiniPath]
> use_eager_rdma = 0
FYI, you can change such values on the command line and/or
environment -- see http://www.open-mpi.org/faq/?
category=tuning#setting-mca-params. The MCA parameter in question is
> Then, it gets much farther before hanging on 2MB+ messages.
> If I create .openmpi/mca-params.conf with
> min_rdma_size = 2147483648
> The benchmark completes reliably.
Yoinks. I assume you mean btl_openib_min_rdma_size, right? (note
that the name slightly changed for the upcoming 1.3 [i.e., the SVN
trunk]; although the old name is deprecated, it'll still work)
> When the hang happens, the ipath driver thinks all the posted
> work requests and completion entries have been generated
> and openmpi seems to think they haven't all completed.
> Can someone point me to the code where RDMA write is polled
> on the destination node?
All the OFA code in OMPI is in ompi/mca/btl/openib (i.e., the
"openib" BTL plugin).
The completion polling occurs in btl_openib_component.c, in two main
functions: btl_openib_component_progress() and
btl_openib_module_progress(). The component progress function mainly
checks for eager RDMA progress; if there are none (per your setting
use_eager_rdma to 0), it'll fall through to the module progress()
function. There's one module "instance" for each HCA port, so we
basically loop over checking each module (port).
Galen tells me that it may be a little more subtle than this, such as
an ordering issue -- he's going to reply with more detail shortly.