Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Deadlock on large numbers of processors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-12-11 16:46:10


George --

Is this the same issue that you're working on?

(we have a "blocker" bug for v1.3 about deadlock at heavy messaging
volume -- on Tuesday, it looked like a bug in our freelist...)

On Dec 9, 2008, at 10:28 AM, Justin wrote:

> I have tried disabling the shared memory by running with the
> following parameters to mpirun
>
> --mca btl openib,self --mca btl_openib_ib_timeout 23 --mca
> btl_openib_use_srq 1 --mca btl_openib_use_rd_max 2048
>
> Unfortunately this did not get rid of any hangs and has seemed to
> make them more common. I have now been able to reproduce the
> deadlock at 32 processors. I am now working with an mpi deadlock
> detection research code which will hopefully be able to tell me if
> there are any deadlocks in our code. At the same time if any of you
> have any suggestions of parameters to openmpi that might alleviate
> these deadlocks I would be grateful.
>
>
> Thanks,
> Justin
>
>
>
>
> Rolf Vandevaart wrote:
>>
>> The current version of Open MPI installed on ranger is 1.3a1r19685
>> which is from early October. This version has a fix for ticket
>> #1378. Ticket #1449 is not an issue is this case because each node
>> has 16 processors and #1449 is for larger SMPs.
>>
>> However, I am wondering if this is because of ticket https://svn.open-mpi.org/trac/ompi/ticket/1468
>> which was not yet fixed in the version running on ranger.
>>
>> As was suggested earlier, running without the sm btl would be a
>> clue if this is the problem.
>>
>> mpirun --mca btl ^sm a.out
>>
>> Another way to potentially work around the issue is to increase the
>> size of the shared memory backing file.
>>
>> mpirun --mca 1073741824 -mca mpool_sm_max_size 1073741824 a.out
>>
>> We will also work with TACC to get an upgraded version of Open MPI
>> 1.3 on there.
>>
>> Let us know what you find.
>>
>> Rolf
>>
>>
>> On 12/09/08 08:05, Lenny Verkhovsky wrote:
>>> also see https://svn.open-mpi.org/trac/ompi/ticket/1449
>>>
>>>
>>>
>>> On 12/9/08, *Lenny Verkhovsky* <lenny.verkhovsky_at_[hidden] <mailto:lenny.verkhovsky_at_[hidden]
>>> >> wrote:
>>>
>>> maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378
>>> ??
>>>
>>>
>>> On 12/5/08, *Justin* <luitjens_at_[hidden]
>>> <mailto:luitjens_at_[hidden]>> wrote:
>>>
>>> The reason i'd like to disable these eager buffers is to help
>>> detect the deadlock better. I would not run with this for a
>>> normal run but it would be useful for debugging. If the
>>> deadlock is indeed due to our code then disabling any shared
>>> buffers or eager sends would make that deadlock
>>> reproduceable. In addition we might be able to lower the
>>> number of processors
>>> down. Right now determining which processor is deadlocks
>>> when
>>> we are using 8K cores and each processor has hundreds of
>>> messages sent out would be quite difficult.
>>>
>>> Thanks for your suggestions,
>>> Justin
>>>
>>> Brock Palen wrote:
>>>
>>> OpenMPI has differnt eager limits for all the network
>>> types,
>>> on your system run:
>>>
>>> ompi_info --param btl all
>>>
>>> and look for the eager_limits
>>> You can set these values to 0 using the syntax I showed
>>> you
>>> before. That would disable eager messages.
>>> There might be a better way to disable eager messages.
>>> Not sure why you would want to disable them, they are
>>> there
>>> for performance.
>>>
>>> Maybe you would still see a deadlock if every message was
>>> below the threshold. I think there is a limit of the
>>> number
>>> of eager messages a receving cpus will accept. Not sure
>>> about that though. I still kind of doubt it though.
>>>
>>> Try tweaking your buffer sizes, make the openib btl
>>> eager
>>> limit the same as shared memory. and see if you get
>>> locks up
>>> between hosts and not just shared memory.
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp <http://www.umich.edu/~brockp>
>>> Center for Advanced Computing
>>> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>>> (734)936-1985
>>>
>>>
>>>
>>> On Dec 5, 2008, at 2:10 PM, Justin wrote:
>>>
>>> Thank you for this info. I should add that our code
>>> tends to post a lot of sends prior to the other side
>>> posting receives. This causes a lot of unexpected
>>> messages to exist. Our code explicitly matches up
>>> all
>>> tags and processors (that is we do not use MPI wild
>>> cards). If we had a dead lock I would think we would
>>> see it regardless of weather or not we cross the
>>> roundevous threshold. I guess one way to test this
>>> would be to to set this threshold to 0. If it then
>>> dead
>>> locks we would likely be able to track down the
>>> deadlock. Are there any other parameters we can send
>>> mpi that will turn off buffering?
>>>
>>> Thanks,
>>> Justin
>>>
>>> Brock Palen wrote:
>>>
>>> When ever this happens we found the code to
>>> have a
>>> deadlock. users never saw it until they cross
>>> the
>>> eager->roundevous threshold.
>>>
>>> Yes you can disable shared memory with:
>>>
>>> mpirun --mca btl ^sm
>>>
>>> Or you can try increasing the eager limit.
>>>
>>> ompi_info --param btl sm
>>>
>>> MCA btl: parameter
>>> "btl_sm_eager_limit" (current value:
>>> "4096")
>>>
>>> You can modify this limit at run time, I think
>>> (can't test it right now) it is just:
>>>
>>> mpirun --mca btl_sm_eager_limit 40960
>>>
>>> I think you can also in tweaking these values use
>>> env Vars in place of putting it all in the
>>> mpirun line:
>>>
>>> export OMPI_MCA_btl_sm_eager_limit=40960
>>>
>>> See:
>>> http://www.open-mpi.org/faq/?category=tuning
>>>
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp <http://www.umich.edu/~brockp
>>> >
>>> Center for Advanced Computing
>>> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>>> (734)936-1985
>>>
>>>
>>>
>>> On Dec 5, 2008, at 12:22 PM, Justin wrote:
>>>
>>> Hi,
>>>
>>> We are currently using OpenMPI 1.3 on
>>> Ranger for
>>> large processor jobs (8K+). Our code
>>> appears to
>>> be occasionally deadlocking at random within
>>> point to point communication (see stacktrace
>>> below). This code has been tested on many
>>> different MPI versions and as far as we
>>> know it
>>> does not contain a deadlock. However, in the
>>> past we have ran into problems with shared
>>> memory optimizations within MPI causing
>>> deadlocks. We can usually avoid these by
>>> setting a few environment variables to either
>>> increase the size of shared memory buffers or
>>> disable shared memory optimizations all
>>> together. Does OpenMPI have any known
>>> deadlocks that might be causing our
>>> deadlocks?
>>> If are there any work arounds? Also how
>>> do we
>>> disable shared memory within OpenMPI?
>>>
>>> Here is an example of where processors are
>>> hanging:
>>>
>>> #0 0x00002b2df3522683 in
>>> mca_btl_sm_component_progress () from
>>> /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
>>> mca_btl_sm.so
>>> #1 0x00002b2df2cb46bf in
>>> mca_bml_r2_progress ()
>>> from
>>> /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
>>> mca_bml_r2.so
>>> #2 0x00002b2df0032ea4 in opal_progress ()
>>> from
>>> /opt/apps/intel10_1/openmpi/1.3/lib/libopen-
>>> pal.so.0
>>> #3 0x00002b2ded0d7622 in
>>> ompi_request_default_wait_some () from
>>> /opt/apps/intel10_1/openmpi/1.3//lib/
>>> libmpi.so.0
>>> #4 0x00002b2ded109e34 in PMPI_Waitsome ()
>>> from
>>> /opt/apps/intel10_1/openmpi/1.3//lib/
>>> libmpi.so.0
>>>
>>>
>>> Thanks,
>>> Justin
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_open-
>>> mpi.org>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/
>>> users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems