Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Messages getting lost during transmission (?)
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-09-10 12:53:19


Dennis Luxen wrote:

>> In MPI, you must complete every MPI_Isend by MPI_Wait on the request
>> handle
>> (or a variant like MPI_Waitall or MPI_Test that returns TRUE). An
>> un-completed MPI_Isend leaves resources tied up.
>
> Good point, but that doesn't seem to help. I augmented each MPI_Isend
> with a MPI_Wait.

What does that mean? Does that mean you immediately followed each Isend
with a Wait? Equivalently, did you replace each Isend with a Send?

In your original message, you said each process started by sending a
100K request. If that's the case, and you have blocking sends (or
Isends augmented with Waits), you're not guaranteed progress. E.g.,
consider the last example in
http://www.mpi-forum.org/docs/mpi-11-html/node41.html#Node41 . But your
example code sends only single-int requests. So, this shouldn't be an
issue for your sample code.

Anyhow, I ran your sample code and it hung. Then I replaced Isends with
Sends and it ran. So, at that level, I am as yet unable to reproduce
your problem.

> Now, one process keeps hanging after a number of messages in MPI_Wait
> and the other one keeps MPI_Iprobe'ing for new messages to receive.
>
>> I do not know what symptom to expect from OpenMPI with this particular
>> application error but the one you describe is plausible.
>
> If I start with the parameter "--mca btl tcp,self" on the other hand,
> the processes finish communication just fine. I am not exactly sure
> why this flag helps.