Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-31 17:54:31


Short version:
--------------

The modular wireup code on /tmp/jms-modular-wireup seems to be
working. Can people give it a whirl before I bring it back to the
trunk? The more esoteric your hardware setup, the better.

Longer version:
---------------

I think that I have completed round 1 of the modular wireup work in /
tmp/jms-modular-wireup, meaning that all the wireup code has been
moved out of btl_openib_endpoint.* and into connect/*. The
endpoint.c file now simply calls the connect interface through a
function pointer (allowing the choice of the current RML-based wireup
or the RDMA CM). The selected connect "module" will call back to the
openib endpoint for two things:

1. post receive buffers on a locally-created-but-not-yet-connected qp
2. when the qp is fully connected and ready to be used

This cleaned up the endpoint.* code a *lot*. I also simplified the
RML connection code a bit -- I removed some useless sub-functions, etc.

I *think* that this new connection code is all working, but per
http://www.open-mpi.org/community/lists/devel/2007/07/2058.php, I'm
seeing other weird failures so I'm a little reluctant to put this
back on the trunk until I know that everything is working properly.
Granted, the failures in the other post sound like pml errors and
this should be a wholly separate issue (we would get different
warnings/errors if the btl failed to connect), but still -- it seems
a little safer to be prudent.

Still to do:

- make the static rate be exchanged and set properly during the RML
wireup
- RDMA CM support (it returns ERR_NOT_IMPLEMENTED right now)

-- 
Jeff Squyres
Cisco Systems