Unfortunately now with r17988 I cannot run any mpi programs, they seem
to hang in the modex.
Ralph H Castain wrote:
> Thanks Tim - I found the problem and will commit a fix shortly.
> Appreciate your testing and reporting!
> On 3/27/08 8:24 AM, "Tim Prins" <tprins_at_[hidden]> wrote:
>> This commit breaks things for me. Running on 3 nodes of odin:
>> mpirun -mca btl tcp,sm,self examples/ring_c
>> causes a hang. All of the processes are stuck in
>> orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
>> and the ring program does not hang all the time, but fairly often.
>> rhc_at_[hidden] wrote:
>>> Author: rhc
>>> Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
>>> New Revision: 17941
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/17941
>>> Fix the allgather and allgather_list functions to avoid deadlocks at large
>>> node/proc counts. Violated the RML rules here - we received the allgather
>>> buffer and then did an xcast, which causes a send to go out, and is then
>>> subsequently received by the sender. This fix breaks that pattern by forcing
>>> the recv to complete outside of the function itself - thus, the allgather and
>>> allgather_list always complete their recvs before returning or sending.
>>> Reogranize the grpcomm code a little to provide support for soon-to-come new
>>> grpcomm components. The revised organization puts what will be common code
>>> elements in the base to avoid duplication, while allowing components that
>>> don't need those functions to ignore them.
>>> Text files modified:
>>> trunk/orte/mca/grpcomm/base/Makefile.am | 5
>>> trunk/orte/mca/grpcomm/base/base.h | 23 +
>>> trunk/orte/mca/grpcomm/base/grpcomm_base_close.c | 4
>>> trunk/orte/mca/grpcomm/base/grpcomm_base_open.c | 1
>>> trunk/orte/mca/grpcomm/base/grpcomm_base_select.c | 121 ++---
>>> trunk/orte/mca/grpcomm/basic/grpcomm_basic.h | 16
>>> trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c | 30 -
>>> trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c | 845
>>> trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8
>>> trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c | 8
>>> trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c | 21
>>> trunk/orte/mca/grpcomm/grpcomm.h | 45 +
>>> trunk/orte/mca/rml/rml_types.h | 31
>>> trunk/orte/orted/orted_comm.c | 27 +
>>> 14 files changed, 226 insertions(+), 959 deletions(-)
>>> Diff not shown due to size (92619 bytes).
>>> To see the diff, run the following command:
>>> svn diff -r 17940:17941 --no-diff-deleted
>>> svn mailing list
>> devel mailing list
> devel mailing list