As I understand, to send short MPI messages, OpenMPI copies the
messages to preallocated buffer and then uses RDMA.
I was wondering if we can avoid the overhead of memory copy. If the
user buffers for short messages are reused a lot, we can just register
the user buffer instead of using preallocated buffer. Then we can do
RDMA directly from the user buffer instead of the preallocated buffer.
But if the user buffers are not reused, we will suffer from the
overhead of memory registration.
Besides the overhead of memory registration, is there any other reason
that prevent you to do RDMA directly from the user buffer for short