Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Messages getting lost during transmission (?)
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-09-10 12:53:19


Dennis Luxen wrote:

>> In MPI, you must complete every MPI_Isend by MPI_Wait on the request
>> handle
>> (or a variant like MPI_Waitall or MPI_Test that returns TRUE). An
>> un-completed MPI_Isend leaves resources tied up.
>
> Good point, but that doesn't seem to help. I augmented each MPI_Isend
> with a MPI_Wait.

What does that mean? Does that mean you immediately followed each Isend
with a Wait? Equivalently, did you replace each Isend with a Send?

In your original message, you said each process started by sending a
100K request. If that's the case, and you have blocking sends (or
Isends augmented with Waits), you're not guaranteed progress. E.g.,
consider the last example in
http://www.mpi-forum.org/docs/mpi-11-html/node41.html#Node41 . But your
example code sends only single-int requests. So, this shouldn't be an
issue for your sample code.

Anyhow, I ran your sample code and it hung. Then I replaced Isends with
Sends and it ran. So, at that level, I am as yet unable to reproduce
your problem.

> Now, one process keeps hanging after a number of messages in MPI_Wait
> and the other one keeps MPI_Iprobe'ing for new messages to receive.
>
>> I do not know what symptom to expect from OpenMPI with this particular
>> application error but the one you describe is plausible.
>
> If I start with the parameter "--mca btl tcp,self" on the other hand,
> the processes finish communication just fine. I am not exactly sure
> why this flag helps.