Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-07-11 22:00:22


Lisandro,

The two errors you provide are quite different. The first one has
been addresses few days ago in the trunk (https://svn.open-mpi.org/
trac/ompi/changeset/15291). If instead of the 1.2.3 you use anything
after r15291 you will be safe in a threading case.

The second is different. The problem is that memcpy is a lot faster
than memmove, and that's why we use it. The case where the 2 data
overlap are quite minimal. I'll take a look to see exactly what
happened there.

   george.

On Jul 11, 2007, at 8:08 PM, Lisandro Dalcin wrote:

> Ups, sended to wrong list, forwarded here...
>
> ---------- Forwarded message ----------
> From: Lisandro Dalcin <dalcinl_at_[hidden]>
> Date: Jul 11, 2007 8:58 PM
> Subject: failures runing mpi4py testsuite, perhaps Comm.Split()
> To: Open MPI <bugs_at_[hidden]>
>
>
> Hello all, after a long time I'm here again. I am improving mpi4py in
> order to support MPI threads, and I've found some problem with latest
> version 1.2.3
>
> I've configured with:
>
> $ ./configure --prefix /usr/local/openmpi/1.2.3 --enable-mpi-threads
> --disable-dependency-tracking
>
> However, for the following fail, MPI_Init_thread() was not used. This
> test creates a intercommunicator by using Comm.Split() followed by
> Intracomm.Create_intercomm(). When running in two or more procs (for
> one proc this test is skipped), I got (sometimes) the following trace
>
> [trantor:06601] *** Process received signal ***
> [trantor:06601] Signal: Segmentation fault (11)
> [trantor:06601] Signal code: Address not mapped (1)
> [trantor:06601] Failing at address: 0xa8
> [trantor:06601] [ 0] [0x958440]
> [trantor:06601] [ 1]
> /usr/local/openmpi/1.2.3/lib/openmpi/mca_btl_sm.so
> (mca_btl_sm_component_progress+0x1483)
> [0x995553]
> [trantor:06601] [ 2]
> /usr/local/openmpi/1.2.3/lib/openmpi/mca_bml_r2.so
> (mca_bml_r2_progress+0x36)
> [0x645d06]
> [trantor:06601] [ 3]
> /usr/local/openmpi/1.2.3/lib/libopen-pal.so.0(opal_progress+0x58)
> [0x1a2c88]
> [trantor:06601] [ 4]
> /usr/local/openmpi/1.2.3/lib/libmpi.so.0(ompi_request_wait_all+0xea)
> [0x140a8a]
> [trantor:06601] [ 5]
> /usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so
> (ompi_coll_tuned_sendrecv_actual+0xc8)
> [0x22d6e8]
> [trantor:06601] [ 6]
> /usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so
> (ompi_coll_tuned_allgather_intra_bruck+0xf2)
> [0x231ca2]
> [trantor:06601] [ 7]
> /usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so
> (ompi_coll_tuned_allgather_intra_dec_fixed+0x8b)
> [0x22db7b]
> [trantor:06601] [ 8]
> /usr/local/openmpi/1.2.3/lib/libmpi.so.0(ompi_comm_split+0x9d)
> [0x12d92d]
> [trantor:06601] [ 9]
> /usr/local/openmpi/1.2.3/lib/libmpi.so.0(MPI_Comm_split+0xad)
> [0x15a53d]
> [trantor:06601] [10] /u/dalcinl/lib/python/mpi4py/_mpi.so [0x508500]
> [trantor:06601] [11]
> /usr/local/lib/libpython2.5.so.1.0(PyCFunction_Call+0x14d) [0xe150ad]
> [trantor:06601] [12]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x64af)
> [0xe626bf]
> [trantor:06601] [13]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
> [trantor:06601] [14]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x5a43)
> [0xe61c53]
> [trantor:06601] [15]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x6130)
> [0xe62340]
> [trantor:06601] [16]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
> [trantor:06601] [17] /usr/local/lib/libpython2.5.so.1.0 [0xe01450]
> [trantor:06601] [18]
> /usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
> [trantor:06601] [19]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x42eb)
> [0xe604fb]
> [trantor:06601] [20]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
> [trantor:06601] [21] /usr/local/lib/libpython2.5.so.1.0 [0xe0137a]
> [trantor:06601] [22]
> /usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
> [trantor:06601] [23] /usr/local/lib/libpython2.5.so.1.0 [0xde6de5]
> [trantor:06601] [24]
> /usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
> [trantor:06601] [25] /usr/local/lib/libpython2.5.so.1.0 [0xe2abc9]
> [trantor:06601] [26]
> /usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
> [trantor:06601] [27]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x1481)
> [0xe5d691]
> [trantor:06601] [28]
> /usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
> [trantor:06601] [29] /usr/local/lib/libpython2.5.so.1.0 [0xe01450]
> [trantor:06601] *** End of error message ***
>
>
> As the problem seems to originate in Comm.Split(), I've written a
> small python script to test it::
>
> from mpi4py import MPI
>
> # true MPI_COMM_WORLD_HANDLE
> BASECOMM = MPI.__COMM_WORLD__
>
> BASE_SIZE = BASECOMM.Get_size()
> BASE_RANK = BASECOMM.Get_rank()
>
> if BASE_RANK < (BASE_SIZE // 2) :
> COLOR = 0
> else:
> COLOR = 1
>
> INTRACOMM = BASECOMM.Split(COLOR, key=0)
> print 'Done!!!'
>
> This seems always work, but running it under valgrind (note
> valgrind-py below is just an alias adding a suppression file for
> python) I get the following:
>
> mpiexec -n 3 valgrind-py python test.py
>
> =6727== Warning: set address range perms: large range 134217728
> (defined)
> ==6727== Source and destination overlap in memcpy(0x4C93EA0,
> 0x4C93EA8, 16)
> ==6727== at 0x4006CE6: memcpy (mc_replace_strmem.c:116)
> ==6727== by 0x46C59CA: ompi_ddt_copy_content_same_ddt (in
> /usr/local/openmpi/1.2.3/lib/libmpi.so.0.0.0)
> ==6727== by 0x4BADDCE: ompi_coll_tuned_allgather_intra_bruck (in
> /usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so)
> ==6727== by 0x4BA9B7A: ompi_coll_tuned_allgather_intra_dec_fixed
> (in /usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so)
> ==6727== by 0x46A692C: ompi_comm_split (in
> /usr/local/openmpi/1.2.3/lib/libmpi.so.0.0.0)
> ==6727== by 0x46D353C: PMPI_Comm_split (in
> /usr/local/openmpi/1.2.3/lib/libmpi.so.0.0.0)
> ==6727== by 0x46754FF: comm_split (in /u/dalcinl/lib/python/
> mpi4py/_mpi.so)
> ==6727== by 0x407D0AC: PyCFunction_Call (methodobject.c:108)
> ==6727== by 0x40CA6BE: PyEval_EvalFrameEx (ceval.c:3564)
> ==6727== by 0x40CB813: PyEval_EvalCodeEx (ceval.c:2831)
> ==6727== by 0x40C9C52: PyEval_EvalFrameEx (ceval.c:3660)
> ==6727== by 0x40CB813: PyEval_EvalCodeEx (ceval.c:2831)
> Done!!!
> Done!!!
> Done!!!
>
>
> I hope you can figure what is going on. If you need additional
> info/tests let me know. I have other issues, but that's for tomorrow.
>
> Regards,
>
>
> --
> Lisandro Dalcín
> ---------------
> Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
> Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
> --
> Lisandro Dalcín
> ---------------
> Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
> Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel