Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI 1.2.5 race condition / core dump with MPI_Reduce and MPI_Gather
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-02-28 16:53:11


In this particular case, I don't think the solution is that obvious.
If you look at the stack in the original email, you will notice how we
get into this. The problem here, is that the FREE_LIST_WAIT is used to
get a fragment to store an unexpected message. If this macro return
NULL (in other words the PML is unable to store the unexpected
message), what do you expect to happen ? Drop the message ? Ask the
BTL to hold it for a while ? How about ordering ?

It is unfortunate to say it, only few days after we had the discussion
about the flow control, but the only correct solution here is to add
PML level flow control ...

   george.

On Feb 28, 2008, at 2:55 PM, Christian Bell wrote:

> On Thu, 28 Feb 2008, Gleb Natapov wrote:
>
>> The trick is to call progress only from functions that are called
>> directly by a user process. Never call progress from a callback
>> functions.
>> The main offenders of this rule are calls to OMPI_FREE_LIST_WAIT().
>> They
>> should be changed to OMPI_FREE_LIST_GET() and dial with NULL return
>> value.
>
> Right -- and it should be easy to find more offenders by having an
> assert statement soak in the builds for a while (or by default in
> debug mode).
>
> Was if it was ever part of the (or a) design to allow re-entrant
> calls to progress from the same calling thread ? It can be done but
> callers have to have a holistic view of how other components require
> and make the progress happen -- this isn't compatible with the Open
> MPI model of independent dynamically loadable components.
>
> --
> christian.bell_at_[hidden]
> (QLogic Host Solutions Group, formerly Pathscale)
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/pkcs7-signature attachment: smime.p7s