Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about Lost Messages
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-12-21 13:22:51


Corey,

The communication pattern looks legit, it is very difficult to see what is going wrong with a code to look at. Can you provide a simple case (maybe the skeleton of your application) we can work from?

  George.

On Dec 20, 2012, at 22:07 , Corey Allen <corey.allen_at_[hidden]> wrote:

> Hello,
>
> I am trying to confirm that I am using OpenMPI in a correct way. I
> seem to be losing messages but I don't like to assume there's a bug
> when I'm still new to MPI in general.
>
> I have multiple processes in a master / slaves type setup, and I am
> trying to have multiple persistent non-blocking message requests
> between them to prevent starvation. (Tech detail: 4-core Intel running
> Ubuntu 64-bit and OpenMPI 1.4. Everything is local. Total processes is
> 5. One master, four slaves. The problem only surfaces on the slowest
> slave - the one with the most work.)
>
> The setup is like this:
>
> Master:
>
> Create 3 persistent send requests, with three different buffers (in a 2D array)
> Load data into each buffer
> Start each send request
> In a loop:
> TestSome on the 3 sends
> for each send that's completed
> load new data into the buffer
> restart that send
> loop
>
> Slave:
>
> Create 3 persistent receive requests, with three different buffers (in
> a 2D array)
> Start each receive request
> In a loop:
> WaitAny on the 3 receives
> Consume data from the one receive buffer from WaitAny
> Start that receive again
> loop
>
> Basically what I'm seeing is that the master gets a "completed" send
> request from TestSome and loads new data, restarts, etc. but the slave
> never sees that particular message. I was under the impression that
> WaitAny should return only one message but also should eventually
> return every message sent in this situation.
>
> I am operating under the assumption that even if the send request is
> completed and the buffer overwritten in the master, the receive for
> that message eventually occurs with the correct data in the slave. I
> did not think I had to advise the master that the slave was finished
> reading data out of the receive buffer before the master could reuse
> the send buffer.
>
> What it LOOKS like to me is that WaitAny is marking more than one send
> completed, so the master sends the next message, but I can't see it in
> the slave.
>
> I hope this is making sense. Any input on whether I'm doing this wrong
> or a way to see if the message is really being lost would be helpful.
> If there's a good example code of multiple simultaneous asynchronous
> messages to avoid starvation that is set up better than my approach,
> I'd like to see it.
>
> Thanks!
>
> Corey
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users