Is doing blocking communication in a separate thread better thenasynchronous progress?(At least as a workaround until the proper implementation gets improved)
At the moment, yes. OMPI's asynchronous progress is "loosely tested" (at best).
OMPI's threading support is somewhat stable for some devices (e.g., not OpenFabrics-based networks), but it's still somewhat new, so feedback would be welcome here.