Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Deadlock on large numbers of processors
From: Tim Mattox (timattox_at_[hidden])
Date: 2009-01-12 13:30:03


Hi Justin,
I applied the fixes for this particular deadlock to the 1.3 code base
late last week, see ticket #1725:
https://svn.open-mpi.org/trac/ompi/ticket/1725

This should fix the described problem, but I personally have not tested
to see if the deadlock in question is now gone. Everyone should give
thanks to George for his efforts in tracking down the problem
and finding a solution.
  -- Tim Mattox, the v1.3 gatekeeper

On Mon, Jan 12, 2009 at 12:46 PM, Justin <luitjens_at_[hidden]> wrote:
> Hi, has this deadlock been fixed in the 1.3 source yet?
>
> Thanks,
>
> Justin
>
>
> Jeff Squyres wrote:
>>
>> On Dec 11, 2008, at 5:30 PM, Justin wrote:
>>
>>> The more I look at this bug the more I'm convinced it is with openMPI and
>>> not our code. Here is why: Our code generates a communication/execution
>>> schedule. At each timestep this schedule is executed and all communication
>>> and execution is performed. Our problem is AMR which means the
>>> communication schedule may change from time to time. In this case the
>>> schedule has not changed in many timesteps meaning the same communication
>>> schedule is being used as the last X (x being around 20 in this case)
>>> timesteps.
>>> Our code does have a very large communication problem. I have been able
>>> to reduce the hang down to 16 processors and it seems to me the hang occurs
>>> when he have lots of work per processor. Meaning if I add more processors
>>> it may not hang but reducing processors makes it more likely to hang.
>>> What is the status on the fix for this particular freelist deadlock?
>>
>>
>> George is actively working on it because it is the "last" issue blocking
>> us from releasing v1.3. I fear that if he doesn't get it fixed by tonight,
>> we'll have to push v1.3 to next year (see
>> http://www.open-mpi.org/community/lists/devel/2008/12/5029.php and
>> http://www.open-mpi.org/community/lists/users/2008/12/7499.php).
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmattox_at_[hidden] || timattox_at_[hidden]
    I'm a bright... http://www.the-brights.net/