Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] delays in Isend
From: Bennet Fauber (bennet_at_[hidden])
Date: 2014-03-22 17:44:30


Hi, Ross,

Just out of curiosity, is Rmpi required for some package that you're
using? I only ask because, if you're mostly writing your own MPI
calls, you might want to look at pbdR/pbdMPI, if you haven't already.
They also have a pbdPROF for profiling and which should be able to do
some profiling with MPI.

http://rbigdata.github.io/packages.html

I wasn't sure whether this was really on topic for the list, so I send
it privately. Sorry for the extra noise if you've already eliminated
pdbR as a possibility.

-- bennet

On Sat, Mar 22, 2014 at 3:46 PM, Ross Boylan <ross_at_[hidden]> wrote:
> I have a bunch of simulators communicating results to a single
> assembler. The results seem to take a long time to be received, and the
> delay increases as the system runs. Here are some results:
>
> sent received delay
> 70.679 94.776 24.097
> 94.677 144.906 50.229
> 122.082 238.713 116.631
> 144.785 313.101 168.316
> 167.918 350.037 182.119
> 190.709 384.342 193.633
> Times are wall clock times in seconds since process launch, and so there
> may be some slew between sender and receiver, but it will be consistent
> (this tracks only sends from one simulator and also ignores later sends
> that never arrived--my completion logic needs work).
>
> The results are typically 500kB. Sending is via Isend (non-blocking)
> and receiving via Recv (blocking). The simulators spend most of their
> time computing; in particular there may be significant delays, e.g.,
> from 10 seconds to a minute, between calls to mpi (typically a mix of
> Isend, Recv, and Testsome). All processes are on the same machine (for
> now).
>
> The interval between assembler receives (from all sources) is sometimes
> quite brief, under 2 seconds, and the time between receives is quite
> variable. Neither is consistent with the theory that the receiver is
> saturated receiving messages, each of which takes a long time to
> transmit (I mean the active part of the transmission, when bits are
> flowing). I infer from this that actually transmitting the message does
> not take long, and that the longer gaps between receives have some other
> cause.
>
> This is all from R, and the problem might lie with higher level code.
>
> Can anyone explain what is going on, and what I might do to alleviate
> it?
>
> My speculation is that the necessary handshaking can only take place
> while both processes have called MPI, or perhaps some particular calls
> are required. The assembler spends most of its time executing a
> receive, but the simulators are mostly busy with other stuff. And so I
> suspect the delay is with the simulators, though I'm not sure what to do
> about it. I could wait on completion from the sender, but that kind of
> defeats the purpose of doing an isend.
>
> In a related thread about a similar issue, Jeff Squyres wrote
> (http://www.open-mpi.org/community/lists/users/2011/07/16928.php)
> ----------------------------------------------------
> If so, it's because Open MPI does not do background progress on
> non-blocking sends in all cases. Specifically, if you're sending over
> TCP and the message is "long", the OMPI layer in the master doesn't
> actually send the whole message immediately because it doesn't want to
> unexpectedly consume a lot of resources in the slave. So the master
> only sends a small fragment of the message and the communicator,tag
> tuple suitable for matching at the receiver. When the receiver posts a
> corresponding MPI_Recv (time=C), it sends back an ACK to the master,
> enabling the master to send the rest of the message.
>
> However, since OMPI doesn't support background progress in all
> situations, the master doesn't see this ACK until it goes into the MPI
> progression engine -- i.e., when you call MPI_Recv() at Time=E. Then
> the OMPI layer in the master sees the ACK and sends the rest of the
> message.
> ----------------------------------------------------------------
>
> I'm not sending over tcp (yet) but maybe I'm running into something
> similar.
>
> I had thought the MPI stuff was handled in separate layer or thread that
> would magically do all the work of moving messages around; the fact that
> top shows all the CPU going to the R processes suggests that's not the
> case.
>
> Running OMPI 1.7.4.
>
> Thanks for any help.
> Ross Boylan
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users