Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Deadlock on large numbers of processors
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2008-12-09 09:56:13


The current version of Open MPI installed on ranger is 1.3a1r19685 which
is from early October. This version has a fix for ticket #1378. Ticket
#1449 is not an issue is this case because each node has 16 processors
and #1449 is for larger SMPs.

However, I am wondering if this is because of ticket
https://svn.open-mpi.org/trac/ompi/ticket/1468 which was not yet fixed
in the version running on ranger.

As was suggested earlier, running without the sm btl would be a clue if
this is the problem.

mpirun --mca btl ^sm a.out

Another way to potentially work around the issue is to increase the size
of the shared memory backing file.

mpirun --mca 1073741824 -mca mpool_sm_max_size 1073741824 a.out

We will also work with TACC to get an upgraded version of Open MPI 1.3
on there.

Let us know what you find.

Rolf

On 12/09/08 08:05, Lenny Verkhovsky wrote:
> also see https://svn.open-mpi.org/trac/ompi/ticket/1449
>
>
>
> On 12/9/08, *Lenny Verkhovsky* <lenny.verkhovsky_at_[hidden]
> <mailto:lenny.verkhovsky_at_[hidden]>> wrote:
>
> maybe it's related to https://svn.open-mpi.org/trac/ompi/ticket/1378 ??
>
>
> On 12/5/08, *Justin* <luitjens_at_[hidden]
> <mailto:luitjens_at_[hidden]>> wrote:
>
> The reason i'd like to disable these eager buffers is to help
> detect the deadlock better. I would not run with this for a
> normal run but it would be useful for debugging. If the
> deadlock is indeed due to our code then disabling any shared
> buffers or eager sends would make that deadlock reproduceable.
> In addition we might be able to lower the number of processors
> down. Right now determining which processor is deadlocks when
> we are using 8K cores and each processor has hundreds of
> messages sent out would be quite difficult.
>
> Thanks for your suggestions,
> Justin
>
> Brock Palen wrote:
>
> OpenMPI has differnt eager limits for all the network types,
> on your system run:
>
> ompi_info --param btl all
>
> and look for the eager_limits
> You can set these values to 0 using the syntax I showed you
> before. That would disable eager messages.
> There might be a better way to disable eager messages.
> Not sure why you would want to disable them, they are there
> for performance.
>
> Maybe you would still see a deadlock if every message was
> below the threshold. I think there is a limit of the number
> of eager messages a receving cpus will accept. Not sure
> about that though. I still kind of doubt it though.
>
> Try tweaking your buffer sizes, make the openib btl eager
> limit the same as shared memory. and see if you get locks up
> between hosts and not just shared memory.
>
> Brock Palen
> www.umich.edu/~brockp <http://www.umich.edu/~brockp>
> Center for Advanced Computing
> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
> (734)936-1985
>
>
>
> On Dec 5, 2008, at 2:10 PM, Justin wrote:
>
> Thank you for this info. I should add that our code
> tends to post a lot of sends prior to the other side
> posting receives. This causes a lot of unexpected
> messages to exist. Our code explicitly matches up all
> tags and processors (that is we do not use MPI wild
> cards). If we had a dead lock I would think we would
> see it regardless of weather or not we cross the
> roundevous threshold. I guess one way to test this
> would be to to set this threshold to 0. If it then dead
> locks we would likely be able to track down the
> deadlock. Are there any other parameters we can send
> mpi that will turn off buffering?
>
> Thanks,
> Justin
>
> Brock Palen wrote:
>
> When ever this happens we found the code to have a
> deadlock. users never saw it until they cross the
> eager->roundevous threshold.
>
> Yes you can disable shared memory with:
>
> mpirun --mca btl ^sm
>
> Or you can try increasing the eager limit.
>
> ompi_info --param btl sm
>
> MCA btl: parameter "btl_sm_eager_limit" (current value:
> "4096")
>
> You can modify this limit at run time, I think
> (can't test it right now) it is just:
>
> mpirun --mca btl_sm_eager_limit 40960
>
> I think you can also in tweaking these values use
> env Vars in place of putting it all in the mpirun line:
>
> export OMPI_MCA_btl_sm_eager_limit=40960
>
> See:
> http://www.open-mpi.org/faq/?category=tuning
>
>
> Brock Palen
> www.umich.edu/~brockp <http://www.umich.edu/~brockp>
> Center for Advanced Computing
> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
> (734)936-1985
>
>
>
> On Dec 5, 2008, at 12:22 PM, Justin wrote:
>
> Hi,
>
> We are currently using OpenMPI 1.3 on Ranger for
> large processor jobs (8K+). Our code appears to
> be occasionally deadlocking at random within
> point to point communication (see stacktrace
> below). This code has been tested on many
> different MPI versions and as far as we know it
> does not contain a deadlock. However, in the
> past we have ran into problems with shared
> memory optimizations within MPI causing
> deadlocks. We can usually avoid these by
> setting a few environment variables to either
> increase the size of shared memory buffers or
> disable shared memory optimizations all
> together. Does OpenMPI have any known
> deadlocks that might be causing our deadlocks?
> If are there any work arounds? Also how do we
> disable shared memory within OpenMPI?
>
> Here is an example of where processors are hanging:
>
> #0 0x00002b2df3522683 in
> mca_btl_sm_component_progress () from
> /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_btl_sm.so
> #1 0x00002b2df2cb46bf in mca_bml_r2_progress ()
> from
> /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_bml_r2.so
> #2 0x00002b2df0032ea4 in opal_progress () from
> /opt/apps/intel10_1/openmpi/1.3/lib/libopen-pal.so.0
> #3 0x00002b2ded0d7622 in
> ompi_request_default_wait_some () from
> /opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0
> #4 0x00002b2ded109e34 in PMPI_Waitsome () from
> /opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0
>
>
> Thanks,
> Justin
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================