If I have a master-process that needs to send a chunk of (different)
data to each of my N slave processes as fast as possible, would I
receive the chunk in each of the slaves faster if the master would
launch N threads each doing a blocking send or would it be better to
launch N nonblocking sends in the master.
I'm currently using OpenMPI on ethernet but might the approach be
different with different types of networks ?
thanks in advance,