Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Simple program (103 lines) makes Open-1.4.3 hang
From: Sébastien Boisvert (Sebastien.Boisvert.3_at_[hidden])
Date: 2010-11-23 17:29:44


Le mardi 23 novembre 2010 à 16:07 -0500, Eugene Loh a écrit :
> Sébastien Boisvert wrote:
>
> >Now I can describe the cases.
> >
> >
> The test cases can all be explained by the test requiring eager messages
> (something that test4096.cpp does not require).
>
> >Case 1: 30 MPI ranks, message size is 4096 bytes
> >
> >File: mpirun-np-30-Program-4096.txt
> >Outcome: It hangs -- I killed the poor thing after 30 seconds or so.
> >
> >
> 4096 is rendezvous. For eager, try 4000 or lower.

According to ompi_info, the threshold is 4096, not 4000, right ?

(Open-MPI 1.4.3)
[sboisver12_at_colosse1 ~]$ ompi_info -a|less
                 MCA btl: parameter "btl_sm_eager_limit" (current value:
"4096", data source: default value)
                          Maximum size (in bytes) of "short" messages
(must be >= 1).

"btl_sm_eager_limit: Below this size, messages are sent "eagerly" --
that is, a sender attempts to write its entire message to shared buffers
without waiting for a receiver to be ready. Above this size, a sender
will only write the first part of a message, then wait for the receiver
to acknowledge its ready before continuing. Eager sends can improve
performance by decoupling senders from receivers."

source:
http://www.open-mpi.org/faq/?category=sm#more-sm

It should say "Below this size or equal to this size" instead of "Below
this size" as ompi_info says. ;)

As Mr. George Bosilca put it:

"__should__ is not correct, __might__ is a better verb to describe the
most "common" behavior for small messages. The problem comes from the
fact that in each communicator the FIFO ordering is required by the MPI
standard. As soon as there is any congestion, MPI_Send will block even
for small messages (and this independent on the underlying network)
until all he pending packets have been delivered."

source:
http://www.open-mpi.org/community/lists/devel/2010/11/8696.php

>
> >Case 2: 30 MPI ranks, message size is 1 byte
> >
> >File: mpirun-np-30-Program-1.txt.gz
> >Outcome: It runs just fine.
> >
> >
> 1 byte is eager.

I agree.

>
> >Case 3: 2 MPI ranks, message size is 4096 bytes
> >
> >File: mpirun-np-2-Program-4096.txt
> >Outcome: It hangs -- I killed the poor thing after 30 seconds or so.
> >
> >
> Same as Case 1.
>
> >Case 4: 30 MPI ranks, message size if 4096 bytes, shared memory is
> >disabled
> >
> >File: mpirun-mca-btl-^sm-np-30-Program-4096.txt.gz
> >Outcome: It runs just fine.
> >
> >
> Eager limit for TCP is 65536 (perhaps less some overhead). So, these
> messages are eager.

I agree.

>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel