Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] collective communications broken on more than 4 cores
From: John R. Cary (cary_at_[hidden])
Date: 2009-10-29 12:22:45


This also appears to fix a bug I had reported that did not involve
collective calls.
The code is appended. When run on 64 bit architecture with

iter.cary$ gcc --version
gcc (GCC) 4.4.0 20090506 (Red Hat 4.4.0-4)
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

iter.cary$ uname -a
Linux iter.txcorp.com 2.6.29.4-167.fc11.x86_64 #1 SMP Wed May 27
17:27:08 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
iter.cary$ mpicc -show
gcc -I/usr/local/openmpi-1.3.2-nodlopen/include -pthread
-L/usr/local/torque-2.4.0b1/lib -Wl,--rpath
-Wl,/usr/local/torque-2.4.0b1/lib
-Wl,-rpath,/usr/local/openmpi-1.3.2-nodlopen/lib
-L/usr/local/openmpi-1.3.2-nodlopen/lib -lmpi -lopen-rte -lopen-pal
-ltorque -ldl -lnsl -lutil -lm

as

  mpirun -n 3 ompi1.3.3-bug

it hangs after some 100-500 iterations. When run

  mpirun -n 3 -mca btl ^sm ./ompi1.3.3-bug

or

  mpirun -n 3 -mca btl_sm_num_fifos 5 ./ompi1.3.3-bug

it seems to work fine.

Valgrind points to some issues:

==29641== Syscall param sched_setaffinity(mask) points to unaddressable
byte(s)
==29641== at 0x30B5EDAA79: syscall (in /lib64/libc-2.10.1.so)
==29641== by 0x54B5098: opal_paffinity_linux_plpa_api_probe_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x54B7394: opal_paffinity_linux_plpa_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x54B5D39:
opal_paffinity_linux_plpa_have_topology_information (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x54B4F3F: linux_module_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x54B2D03: opal_paffinity_base_select (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x548C3D3: opal_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x520F09C: orte_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x4E67D26: ompi_mpi_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29641== by 0x4E87195: PMPI_Init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29641== by 0x408011: main (in /home/research/cary/ompi1.3.3-bug)
==29641== Address 0x0 is not stack'd, malloc'd or (recently) free'd

==29641== Warning: client syscall munmap tried to modify addresses
0xffffffffffffffff-0xffe
==29640== Warning: client syscall munmap tried to modify addresses
0xffffffffffffffff-0xffe
==29639== Warning: client syscall munmap tried to modify addresses
0xffffffffffffffff-0xffe
==29641==
==29641== Syscall param writev(vector[...]) points to uninitialised byte(s)
==29641== at 0x30B5ED67AB: writev (in /lib64/libc-2.10.1.so)
==29641== by 0x5241686: mca_oob_tcp_msg_send_handler (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x52426BC: mca_oob_tcp_peer_send (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x52450EC: mca_oob_tcp_send_nb (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x5255B33: orte_rml_oob_send_buffer (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x5230682: allgather (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x5230179: modex (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x4E68199: ompi_mpi_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29641== by 0x4E87195: PMPI_Init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29641== by 0x408011: main (in /home/research/cary/ompi1.3.3-bug)
==29641== Address 0x5c89aef is 87 bytes inside a block of size 128 alloc'd
==29641== at 0x4A0763E: malloc (vg_replace_malloc.c:207)
==29641== by 0x548D76A: opal_dss_buffer_extend (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x548E780: opal_dss_pack (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-pal.so.0.0.0)
==29641== by 0x5230620: allgather (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x5230179: modex (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libopen-rte.so.0.0.0)
==29641== by 0x4E68199: ompi_mpi_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29641== by 0x4E87195: PMPI_Init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29641== by 0x408011: main (in /home/research/cary/ompi1.3.3-bug)

==29640== Conditional jump or move depends on uninitialised value(s)
==29640== at 0x4EF26A4: mca_mpool_sm_alloc (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29640== by 0x4E4BEEF: ompi_free_list_grow (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29640== by 0x4EA8793: mca_btl_sm_add_procs (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29640== by 0x4E9E6E9: mca_bml_r2_add_procs (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29640== by 0x4F0B564: mca_pml_ob1_add_procs (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29640== by 0x4E68288: ompi_mpi_init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29640== by 0x4E87195: PMPI_Init (in
/usr/local/openmpi-1.3.2-nodlopen/lib/libmpi.so.0.0.0)
==29640== by 0x408011: main (in /home/research/cary/ompi1.3.3-bug)

....John Cary

Vincent Loechner wrote:
>>>>> It seems that the calls to collective communication are not
>>>>> returning for some MPI processes, when the number of processes is
>>>>> greater or equal to 5. It's reproduceable, on two different
>>>>> architectures, with two different versions of OpenMPI (1.3.2 and
>>>>> 1.3.3). It was working correctly with OpenMPI version 1.2.7.
>>>>>
>>>> Does it work if you turn off the shared memory transport layer;
>>>> that is,
>>>>
>>>> mpirun -n 6 -mca btl ^sm ./testmpi
>>>>
>>> Yes it does, on both my configurations (AMD and Intel processor).
>>> So it seems that the shared memory synchronization process is
>>> broken.
>>>
>> Presumably that is this bug:
>> https://svn.open-mpi.org/trac/ompi/ticket/2043
>>
>
> Yes it is.
>
>
>> I also found by trial and error that increasing the number of fifos, eg
>> -mca btl_sm_num_fifos 5
>> on a 6-processor job, apparently worked around the problem.
>> But yes, something seems broken in OpenMP shared memory transport with
>> gcc 4.4.x.
>>
>
> Yes, same for me: -mca btl_sm_num_fifos 5 worked.
> Thanks for your answer Jonathan.
>
> If I may help the developpers in any way to track this bug get into
> contact with me.
>

iter.cary$ cat ompi1.3.3-bug.cxx
/**
 * A simple test program to demonstrate a problem in OpenMPI 1.3
 */

// mpi includes
#include <mpi.h>

// std includes
#include <iostream>
#include <vector>

// useful hashdefine
#define ARRAY_SIZE 250

/**
 * Main driver
 */
int main(int argc, char** argv) {
// Initialize MPI
  MPI_Init(&argc, &argv);

  int rk, sz;
  MPI_Comm_rank(MPI_COMM_WORLD, &rk);
  MPI_Comm_size(MPI_COMM_WORLD, &sz);

// Create some data to pass around
  std::vector<double> d(ARRAY_SIZE);

// Initialize to some values if we aren't rank 0
  if ( rk )
    for ( unsigned i = 0; i < ARRAY_SIZE; ++i )
      d[i] = 2*i + 1;

// Loop until this breaks
  unsigned t = 0;
  while ( 1 ) {
    MPI_Status s;
    if ( rk )
      MPI_Send( &d[0], d.size(), MPI_DOUBLE, 0, 3, MPI_COMM_WORLD );
    else
      for ( int i = 1; i < sz; ++i )
        MPI_Recv( &d[0], d.size(), MPI_DOUBLE, i, 3, MPI_COMM_WORLD, &s );
    MPI_Barrier(MPI_COMM_WORLD);
    std::cout << "Transmission " << ++t << " completed." << std::endl;
  }

// Finalize MPI
  MPI_Finalize();
}

-- 
Tech-X Corp., 5621 Arapahoe Ave, Suite A, Boulder CO 80303
cary_at_[hidden], p 303-448-0727, f 303-448-7756, NEW CELL 303-881-8572