The communication pattern looks legit, it is very difficult to see what is going wrong with a code to look at. Can you provide a simple case (maybe the skeleton of your application) we can work from?
On Dec 20, 2012, at 22:07 , Corey Allen <corey.allen_at_[hidden]> wrote:
> I am trying to confirm that I am using OpenMPI in a correct way. I
> seem to be losing messages but I don't like to assume there's a bug
> when I'm still new to MPI in general.
> I have multiple processes in a master / slaves type setup, and I am
> trying to have multiple persistent non-blocking message requests
> between them to prevent starvation. (Tech detail: 4-core Intel running
> Ubuntu 64-bit and OpenMPI 1.4. Everything is local. Total processes is
> 5. One master, four slaves. The problem only surfaces on the slowest
> slave - the one with the most work.)
> The setup is like this:
> Create 3 persistent send requests, with three different buffers (in a 2D array)
> Load data into each buffer
> Start each send request
> In a loop:
> TestSome on the 3 sends
> for each send that's completed
> load new data into the buffer
> restart that send
> Create 3 persistent receive requests, with three different buffers (in
> a 2D array)
> Start each receive request
> In a loop:
> WaitAny on the 3 receives
> Consume data from the one receive buffer from WaitAny
> Start that receive again
> Basically what I'm seeing is that the master gets a "completed" send
> request from TestSome and loads new data, restarts, etc. but the slave
> never sees that particular message. I was under the impression that
> WaitAny should return only one message but also should eventually
> return every message sent in this situation.
> I am operating under the assumption that even if the send request is
> completed and the buffer overwritten in the master, the receive for
> that message eventually occurs with the correct data in the slave. I
> did not think I had to advise the master that the slave was finished
> reading data out of the receive buffer before the master could reuse
> the send buffer.
> What it LOOKS like to me is that WaitAny is marking more than one send
> completed, so the master sends the next message, but I can't see it in
> the slave.
> I hope this is making sense. Any input on whether I'm doing this wrong
> or a way to see if the message is really being lost would be helpful.
> If there's a good example code of multiple simultaneous asynchronous
> messages to avoid starvation that is set up better than my approach,
> I'd like to see it.
> users mailing list