We have run across an issue, probably more related to openib than to
openmpi but don't know how to resolve.
Linux kernel - 2.6.9-55.0.2.ELsmp x86_64
openmpi - it doesn't matter - 1.1.5 and 1.2.3 both fail.
When the sample code is run across IB nodes, using the IB interface, the
receive just hangs whenever a system call is issued. Removing this
system call removes the hang. Running across the nodes over TCP removes
the hang. Running on a single node removes the hang. Only when using
the IB interface do we have this hang.
So the simple solution is "don't do this" but apparently something
deeper is involved and who knows where it will pop up again.
ps - sample code compiled using mpicc, built with gcc. You'll need a
test.dat file for the system("cp") command.