Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Lisandro Dalcin (dalcinl_at_[hidden])
Date: 2007-07-11 20:08:53


Ups, sended to wrong list, forwarded here...

---------- Forwarded message ----------
From: Lisandro Dalcin <dalcinl_at_[hidden]>
Date: Jul 11, 2007 8:58 PM
Subject: failures runing mpi4py testsuite, perhaps Comm.Split()
To: Open MPI <bugs_at_[hidden]>

Hello all, after a long time I'm here again. I am improving mpi4py in
order to support MPI threads, and I've found some problem with latest
version 1.2.3

I've configured with:

$ ./configure --prefix /usr/local/openmpi/1.2.3 --enable-mpi-threads
--disable-dependency-tracking

However, for the following fail, MPI_Init_thread() was not used. This
test creates a intercommunicator by using Comm.Split() followed by
Intracomm.Create_intercomm(). When running in two or more procs (for
one proc this test is skipped), I got (sometimes) the following trace

[trantor:06601] *** Process received signal ***
[trantor:06601] Signal: Segmentation fault (11)
[trantor:06601] Signal code: Address not mapped (1)
[trantor:06601] Failing at address: 0xa8
[trantor:06601] [ 0] [0x958440]
[trantor:06601] [ 1]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x1483)
[0x995553]
[trantor:06601] [ 2]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x36)
[0x645d06]
[trantor:06601] [ 3]
/usr/local/openmpi/1.2.3/lib/libopen-pal.so.0(opal_progress+0x58)
[0x1a2c88]
[trantor:06601] [ 4]
/usr/local/openmpi/1.2.3/lib/libmpi.so.0(ompi_request_wait_all+0xea)
[0x140a8a]
[trantor:06601] [ 5]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual+0xc8)
[0x22d6e8]
[trantor:06601] [ 6]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_bruck+0xf2)
[0x231ca2]
[trantor:06601] [ 7]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_dec_fixed+0x8b)
[0x22db7b]
[trantor:06601] [ 8]
/usr/local/openmpi/1.2.3/lib/libmpi.so.0(ompi_comm_split+0x9d)
[0x12d92d]
[trantor:06601] [ 9]
/usr/local/openmpi/1.2.3/lib/libmpi.so.0(MPI_Comm_split+0xad)
[0x15a53d]
[trantor:06601] [10] /u/dalcinl/lib/python/mpi4py/_mpi.so [0x508500]
[trantor:06601] [11]
/usr/local/lib/libpython2.5.so.1.0(PyCFunction_Call+0x14d) [0xe150ad]
[trantor:06601] [12]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x64af)
[0xe626bf]
[trantor:06601] [13]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [14]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x5a43)
[0xe61c53]
[trantor:06601] [15]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x6130)
[0xe62340]
[trantor:06601] [16]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [17] /usr/local/lib/libpython2.5.so.1.0 [0xe01450]
[trantor:06601] [18]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [19]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x42eb)
[0xe604fb]
[trantor:06601] [20]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [21] /usr/local/lib/libpython2.5.so.1.0 [0xe0137a]
[trantor:06601] [22]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [23] /usr/local/lib/libpython2.5.so.1.0 [0xde6de5]
[trantor:06601] [24]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [25] /usr/local/lib/libpython2.5.so.1.0 [0xe2abc9]
[trantor:06601] [26]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [27]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x1481)
[0xe5d691]
[trantor:06601] [28]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [29] /usr/local/lib/libpython2.5.so.1.0 [0xe01450]
[trantor:06601] *** End of error message ***

As the problem seems to originate in Comm.Split(), I've written a
small python script to test it::

from mpi4py import MPI

# true MPI_COMM_WORLD_HANDLE
BASECOMM = MPI.__COMM_WORLD__

BASE_SIZE = BASECOMM.Get_size()
BASE_RANK = BASECOMM.Get_rank()

if BASE_RANK < (BASE_SIZE // 2) :
    COLOR = 0
else:
    COLOR = 1

INTRACOMM = BASECOMM.Split(COLOR, key=0)
print 'Done!!!'

This seems always work, but running it under valgrind (note
valgrind-py below is just an alias adding a suppression file for
python) I get the following:

mpiexec -n 3 valgrind-py python test.py

=6727== Warning: set address range perms: large range 134217728 (defined)
==6727== Source and destination overlap in memcpy(0x4C93EA0, 0x4C93EA8, 16)
==6727== at 0x4006CE6: memcpy (mc_replace_strmem.c:116)
==6727== by 0x46C59CA: ompi_ddt_copy_content_same_ddt (in
/usr/local/openmpi/1.2.3/lib/libmpi.so.0.0.0)
==6727== by 0x4BADDCE: ompi_coll_tuned_allgather_intra_bruck (in
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so)
==6727== by 0x4BA9B7A: ompi_coll_tuned_allgather_intra_dec_fixed
(in /usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so)
==6727== by 0x46A692C: ompi_comm_split (in
/usr/local/openmpi/1.2.3/lib/libmpi.so.0.0.0)
==6727== by 0x46D353C: PMPI_Comm_split (in
/usr/local/openmpi/1.2.3/lib/libmpi.so.0.0.0)
==6727== by 0x46754FF: comm_split (in /u/dalcinl/lib/python/mpi4py/_mpi.so)
==6727== by 0x407D0AC: PyCFunction_Call (methodobject.c:108)
==6727== by 0x40CA6BE: PyEval_EvalFrameEx (ceval.c:3564)
==6727== by 0x40CB813: PyEval_EvalCodeEx (ceval.c:2831)
==6727== by 0x40C9C52: PyEval_EvalFrameEx (ceval.c:3660)
==6727== by 0x40CB813: PyEval_EvalCodeEx (ceval.c:2831)
Done!!!
Done!!!
Done!!!

I hope you can figure what is going on. If you need additional
info/tests let me know. I have other issues, but that's for tomorrow.

Regards,

--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
-- 
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594