Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Deadlock on large numbers of processors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-12-11 16:46:10


George --

Is this the same issue that you're working on?

(we have a "blocker" bug for v1.3 about deadlock at heavy messaging
volume -- on Tuesday, it looked like a bug in our freelist...)

On Dec 9, 2008, at 10:28 AM, Justin wrote:

> I have tried disabling the shared memory by running with the
> following parameters to mpirun
>
> --mca btl openib,self --mca btl_openib_ib_timeout 23 --mca
> btl_openib_use_srq 1 --mca btl_openib_use_rd_max 2048
>
> Unfortunately this did not get rid of any hangs and has seemed to
> make them more common. I have now been able to reproduce the
> deadlock at 32 processors. I am now working with an mpi deadlock
> detection research code which will hopefully be able to tell me if
> there are any deadlocks in our code. At the same time if any of you
> have any suggestions of parameters to openmpi that might alleviate
> these deadlocks I would be grateful.
>
>
> Thanks,
> Justin
>
>
>
>
> Rolf Vandevaart wrote:
>>
>> The current version of Open MPI installed on ranger is 1.3a1r19685
>> which is from early October. This version has a fix for ticket
>> #1378. Ticket #1449 is not an issue is this case because each node
>> has 16 processors and #1449 is for larger SMPs.
>>
>> However, I am wondering if this is because of ticket https://svn.open-mpi.org/trac/ompi/ticket/1468
>> which was not yet fixed in the version running on ranger.
>>
>> As was suggested earlier, running without the sm btl would be a
>> clue if this is the problem.
>>
>> mpirun --mca btl ^sm a.out
>>
>> Another way to potentially work around the issue is to increase the
>> size of the shared memory backing file.
>>
>> mpirun --mca 1073741824 -mca mpool_sm_max_size 1073741824 a.out
>>
>> We will also work with TACC to get an upgraded version of Open MPI
>> 1.3 on there.
>>
>> Let us know what you find.
>>
>> Rolf
>>
>>
>> On 12/09/08 08:05, Lenny Verkhovsky wrote:
>>> also see https://svn.open-mpi.org/trac/ompi/ticket/1449
>>>
>>>
>>>
>>> On 12/9/08, *Lenny Verkhovsky* <lenny.verkhovsky_at_[hidden] <mailto:lenny.verkhovsky_at_[hidden]
>>> >> wrote:
>>>
>>> maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378
>>> ??
>>>
>>>
>>> On 12/5/08, *Justin* <luitjens_at_[hidden]
>>> <mailto:luitjens_at_[hidden]>> wrote:
>>>
>>> The reason i'd like to disable these eager buffers is to help
>>> detect the deadlock better. I would not run with this for a
>>> normal run but it would be useful for debugging. If the
>>> deadlock is indeed due to our code then disabling any shared
>>> buffers or eager sends would make that deadlock
>>> reproduceable. In addition we might be able to lower the
>>> number of processors
>>> down. Right now determining which processor is deadlocks
>>> when
>>> we are using 8K cores and each processor has hundreds of
>>> messages sent out would be quite difficult.
>>>
>>> Thanks for your suggestions,
>>> Justin
>>>
>>> Brock Palen wrote:
>>>
>>> OpenMPI has differnt eager limits for all the network
>>> types,
>>> on your system run:
>>>
>>> ompi_info --param btl all
>>>
>>> and look for the eager_limits
>>> You can set these values to 0 using the syntax I showed
>>> you
>>> before. That would disable eager messages.
>>> There might be a better way to disable eager messages.
>>> Not sure why you would want to disable them, they are
>>> there
>>> for performance.
>>>
>>> Maybe you would still see a deadlock if every message was
>>> below the threshold. I think there is a limit of the
>>> number
>>> of eager messages a receving cpus will accept. Not sure
>>> about that though. I still kind of doubt it though.
>>>
>>> Try tweaking your buffer sizes, make the openib btl
>>> eager
>>> limit the same as shared memory. and see if you get
>>> locks up
>>> between hosts and not just shared memory.
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp <http://www.umich.edu/~brockp>
>>> Center for Advanced Computing
>>> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>>> (734)936-1985
>>>
>>>
>>>
>>> On Dec 5, 2008, at 2:10 PM, Justin wrote:
>>>
>>> Thank you for this info. I should add that our code
>>> tends to post a lot of sends prior to the other side
>>> posting receives. This causes a lot of unexpected
>>> messages to exist. Our code explicitly matches up
>>> all
>>> tags and processors (that is we do not use MPI wild
>>> cards). If we had a dead lock I would think we would
>>> see it regardless of weather or not we cross the
>>> roundevous threshold. I guess one way to test this
>>> would be to to set this threshold to 0. If it then
>>> dead
>>> locks we would likely be able to track down the
>>> deadlock. Are there any other parameters we can send
>>> mpi that will turn off buffering?
>>>
>>> Thanks,
>>> Justin
>>>
>>> Brock Palen wrote:
>>>
>>> When ever this happens we found the code to
>>> have a
>>> deadlock. users never saw it until they cross
>>> the
>>> eager->roundevous threshold.
>>>
>>> Yes you can disable shared memory with:
>>>
>>> mpirun --mca btl ^sm
>>>
>>> Or you can try increasing the eager limit.
>>>
>>> ompi_info --param btl sm
>>>
>>> MCA btl: parameter
>>> "btl_sm_eager_limit" (current value:
>>> "4096")
>>>
>>> You can modify this limit at run time, I think
>>> (can't test it right now) it is just:
>>>
>>> mpirun --mca btl_sm_eager_limit 40960
>>>
>>> I think you can also in tweaking these values use
>>> env Vars in place of putting it all in the
>>> mpirun line:
>>>
>>> export OMPI_MCA_btl_sm_eager_limit=40960
>>>
>>> See:
>>> http://www.open-mpi.org/faq/?category=tuning
>>>
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp <http://www.umich.edu/~brockp
>>> >
>>> Center for Advanced Computing
>>> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>>> (734)936-1985
>>>
>>>
>>>
>>> On Dec 5, 2008, at 12:22 PM, Justin wrote:
>>>
>>> Hi,
>>>
>>> We are currently using OpenMPI 1.3 on
>>> Ranger for
>>> large processor jobs (8K+). Our code
>>> appears to
>>> be occasionally deadlocking at random within
>>> point to point communication (see stacktrace
>>> below). This code has been tested on many
>>> different MPI versions and as far as we
>>> know it
>>> does not contain a deadlock. However, in the
>>> past we have ran into problems with shared
>>> memory optimizations within MPI causing
>>> deadlocks. We can usually avoid these by
>>> setting a few environment variables to either
>>> increase the size of shared memory buffers or
>>> disable shared memory optimizations all
>>> together. Does OpenMPI have any known
>>> deadlocks that might be causing our
>>> deadlocks?
>>> If are there any work arounds? Also how
>>> do we
>>> disable shared memory within OpenMPI?
>>>
>>> Here is an example of where processors are
>>> hanging:
>>>
>>> #0 0x00002b2df3522683 in
>>> mca_btl_sm_component_progress () from
>>> /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
>>> mca_btl_sm.so
>>> #1 0x00002b2df2cb46bf in
>>> mca_bml_r2_progress ()
>>> from
>>> /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
>>> mca_bml_r2.so
>>> #2 0x00002b2df0032ea4 in opal_progress ()
>>> from
>>> /opt/apps/intel10_1/openmpi/1.3/lib/libopen-
>>> pal.so.0
>>> #3 0x00002b2ded0d7622 in
>>> ompi_request_default_wait_some () from
>>> /opt/apps/intel10_1/openmpi/1.3//lib/
>>> libmpi.so.0
>>> #4 0x00002b2ded109e34 in PMPI_Waitsome ()
>>> from
>>> /opt/apps/intel10_1/openmpi/1.3//lib/
>>> libmpi.so.0
>>>
>>>
>>> Thanks,
>>> Justin
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_open-
>>> mpi.org>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/
>>> users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems