Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Simple program (103 lines) makes Open-1.4.3 hang
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2010-11-23 15:55:36

To add to Jeff's comments:

Sébastien Boisvert wrote:

>The reason is that I am developping an MPI-based software, and I use
>Open-MPI as it is the only implementation I am aware of that send
>messages eagerly (powerful feature, that is).
As wonderful as OMPI is, I am fairly sure other MPI implementations also
support eager message passing. That is, there is a capability for a
sender to hand message data over to the MPI implementation, freeing the
user send buffer and allowing an MPI_Send() call to complete, without
the message reaching the receiver or the receiver being ready.

>Each byte transfer layer has its default limit to send eagerly a
>message. With shared memory (sm), the value is 4096 bytes. At least it
>is according to ompi_info.
Yes. I think that 4096 bytes can be a little tricky... it may include
some header information. So, the amount of user data that could be sent
would be a little bit less... e.g., 4,000 bytes or so.

>To verify this limit, I implemented a very simple test. The source code
>is test4096.cpp, which basically just send a single message of 4096
>bytes from a rank to another (rank 1 to 0).
I don't think the test says much at all. It has one process post an
MPI_Send and another post an MPI_Recv. Such a test should complete
under a very wide range of conditions.

Here is perhaps a better test:

#include <stdio.h>
#include <mpi.h>

int main(int argc, char **argv) {
  int me;
  char buf[N];

  printf("%d of %d done\n", me, np);

  return 0;

Compile with the preprocessor symbol N defined to, say, 64. Run for
--np 2. Each process will try to send. The code will complete for
short, eager messages. If the messages are long, nothing is sent
eagerly and both processes stay hung in their sends. Bump N up slowly.
For N=4096, the code hangs. For N slightly less -- say, 4000 -- it runs.