I've just found this information on nVidia's plans regarding enhanced
support for MPI in their CUDA toolkit:
The idea that two GPUs can talk to each other via network cards without
CPU as a middleman looks very promising.
This technology is supposed to be revealed and released in September.
1. Will OpenMPI include RDMA support in its CUDA interface?
2. Any idea how much can this technology reduce the CUDA Send/Recv latency?
3. Any idea whether this technology will be available for Fermi-class
Tesla devices or only for Keplers?