I am new user of Open MPI, I've used MPICH before.
I've tried on the user list but they couldn't help me.
There is performance bug with the following scenario:
For message size 8MB, proc_B calls MPI_Test 88 times. It means that
point to point communication costs 88 seconds.
Btw, bandwidth isn't the problem (interconnection network: InfiniBand)
Obviously, there is the problem with progress of the asynchronous
messages. In order to overlap communication and computation I don't want to use
MPI_Wait. Probably, the message is being decomposed into chucks and the
size of chuck is probably defined by environment variable.
How can I advance the message more aggressively or can I control size of chunk?
Thank you very much