Hi George,
Some more investigation on the Segmentation
fault done with valgrind is shown below.
There seems to be uninitialized parameters
and finally a read at address 0x1, which
causes the segfault. I have checked whether
one of my members appear to be at that
address when constructing the MPI_Datatype, this is not the case.
Maybe you can draw a conclusion from the
information and give me a hint on where
to search futher.
Thanks!
Michael
==4225==
==4225== Syscall param writev(vector[...])
points to uninitialised byte(s)
==4225== at 0x4000792:
(within /lib/ld-2.3.6.so)
==4225== by 0x6915CE5:
mca_btl_tcp_frag_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x6915923:
mca_btl_tcp_endpoint_send_handler (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x5941710:
opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x59418F8:
opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x593BFFD:
opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x68F4384:
mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x58B3CCF:
PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225== by 0x8549D6C:
GP::MPIProcessGroup::send(GP::cstring const&, int, int)
(gpProcessGroup.cpp:326)
==4225== by 0x8549A32:
GP::MPIProcessGroup::send(boost::shared_ptr<GP::Entity>, int, int)
(gpProcessGroup.cpp:263)
==4225== by 0x88286B1:
GP::ParallelDataAccessor::load(boost::shared_ptr<GP::Entity>)
(gpParallelDataAccessor.cpp:105)
==4225== by 0x86F7A5A:
GP::Transactions::create(xercesc_2_7::DOMNode const*) (gpTransaction.cpp:993)
==4225== Address 0x8AD6970 is not
stack'd, malloc'd or (recently) free'd
MPIProcessGroup::send(const MemoryMap ..)
method begin.
MPIProcessGroup::send(const MemoryMap ..)
calling MPI_Send with 12159464 bytes.
==4225==
==4225== Syscall param writev(vector[...])
points to uninitialised byte(s)
==4225== at 0x4000792:
(within /lib/ld-2.3.6.so)
==4225== by 0x6915CE5:
mca_btl_tcp_frag_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x6914CB3:
mca_btl_tcp_endpoint_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x6912287:
mca_btl_tcp_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x68FA38D:
mca_pml_ob1_send_request_start_rndv (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x68F426F:
mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x58B3CCF:
PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225== by 0x8549C05:
GP::MPIProcessGroup::send(GP::MemoryMap const&, int, int)
(gpProcessGroup.cpp:296)
==4225== by 0x8549AA0:
GP::MPIProcessGroup::send(boost::shared_ptr<GP::Entity>, int, int)
(gpProcessGroup.cpp:271)
==4225== by 0x88286B1:
GP::ParallelDataAccessor::load(boost::shared_ptr<GP::Entity>)
(gpParallelDataAccessor.cpp:105)
==4225== by 0x86F7A5A:
GP::Transactions::create(xercesc_2_7::DOMNode const*) (gpTransaction.cpp:993)
==4225== by 0x85461B3:
GP::Factory<boost::shared_ptr<GP::XmlBase>, std::string,
boost::shared_ptr<GP::XmlBase> (*)(xercesc_2_7::DOMNode const*),
GP::DefaultFactoryError>::createObject(xercesc_2_7::DOMNode const*)
(gpXmlFactoryParser.cpp:46)
==4225== Address 0x8AD6970 is not
stack'd, malloc'd or (recently) free'd
==4226==
==4226== Syscall param writev(vector[...])
points to uninitialised byte(s)
==4226== at 0x4000792:
(within /lib/ld-2.3.6.so)
==4226== by 0x6915CE5:
mca_btl_tcp_frag_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4226== by 0x6914CB3:
mca_btl_tcp_endpoint_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4226== by 0x6912287:
mca_btl_tcp_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4226== by 0x68F6C02:
mca_pml_ob1_recv_request_ack_send_btl (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4226== by 0x68F7CB6:
mca_pml_ob1_recv_request_progress (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4226== by 0x68F5D77:
mca_pml_ob1_recv_frag_match (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4226== by 0x6915178:
mca_btl_tcp_endpoint_recv_handler (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4226== by 0x5941710:
opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4226== by 0x59418F8:
opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4226== by 0x593BFFD:
opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4226== by 0x68F3084:
mca_pml_ob1_recv (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4226== Address 0x8AD6970 is not
stack'd, malloc'd or (recently) free'd
==4225==
==4225== Invalid read of size 1
==4225== at 0x401ECD0:
memcpy (mc_replace_strmem.c:406)
==4225== by 0x589216E:
ompi_generic_simple_pack (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225== by 0x5895F1A:
ompi_convertor_pack (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225== by 0x6911A07:
mca_btl_tcp_prepare_src (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x68F9FDB:
mca_pml_ob1_send_request_schedule_exclusive (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x68FB717:
mca_pml_ob1_frag_completion (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x6915905:
mca_btl_tcp_endpoint_send_handler (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x5941710:
opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x59418F8:
opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x593BFFD:
opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x68F4384:
mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x58B3CCF:
PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225== Address 0x1 is not stack'd,
malloc'd or (recently) free'd
[head:04225] *** Process received signal
***
[head:04225] Signal: Segmentation fault
(11)
[head:04225] Signal code: Address not
mapped (1)
[head:04225] Failing at address: 0x1
[head:04225] [ 0]
/lib/tls/i686/cmov/libpthread.so.0 [0x5ac3bd0]
[head:04225] [ 1]
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(ompi_generic_simple_pack+0x2ff)
[0x589216f]
[head:04225] [ 2]
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(ompi_convertor_pack+0x12b)
[0x5895f1b]
[head:04225] [ 3]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_prepare_src+0x198)
[0x6911a08]
[head:04225] [ 4]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_exclusive+0x18c)
[0x68f9fdc]
[head:04225] [ 5]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so [0x68fb718]
[head:04225] [ 6]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so [0x6915906]
[head:04225] [ 7] /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_event_base_loop+0x391)
[0x5941711]
[head:04225] [ 8]
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_event_loop+0x29)
[0x59418f9]
[head:04225] [ 9]
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_progress+0xbe)
[0x593bffe]
[head:04225] [10]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x5d5)
[0x68f4385]
[head:04225] [11]
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(MPI_Send+0x170) [0x58b3cd0]
[head:04225] [12]
/opt/plato/release_1.0/bin/engine(_ZN2GP15MPIProcessGroup4sendERKNS_9MemoryMapEii+0xee)
[0x8549c06]
[head:04225] [13]
/opt/plato/release_1.0/bin/engine(_ZN2GP15MPIProcessGroup4sendEN5boost10shared_ptrINS_6EntityEEEii+0x165)
[0x8549aa1]
[head:04225] [14]
/opt/plato/release_1.0/bin/engine(_ZN2GP20ParallelDataAccessor4loadEN5boost10shared_ptrINS_6EntityEEE+0x15c)
[0x88286b2]
[head:04225] [15]
/opt/plato/release_1.0/bin/engine(_ZN2GP12Transactions6createEPKN11xercesc_2_77DOMNodeE+0x361)
[0x86f7a5b]
[head:04225] [16]
/opt/plato/release_1.0/bin/engine(_ZN2GP7FactoryIN5boost10shared_ptrINS_7XmlBaseEEESsPFS4_PKN11xercesc_2_77DOMNodeEENS_19DefaultFactoryErrorEE12createObjectES8_+0xde)
[0x85461b4]
[head:04225] [17]
/opt/plato/release_1.0/bin/engine(_ZN2GP16XmlFactoryParser7descentEPN11xercesc_2_77DOMNodeEb+0x5c9)
[0x85469d7]
[head:04225] [18]
/opt/plato/release_1.0/bin/engine(_ZN2GP9XmlParser8traverseEb+0x1d8)
[0x8542c90]
[head:04225] [19]
/opt/plato/release_1.0/bin/engine(_ZN2GP16XmlFactoryParser8traverseEb+0x1f)
[0x85463ff]
[head:04225] [20]
/opt/plato/release_1.0/bin/engine(main+0x1566) [0x84d02b0]
[head:04225] [21]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0x5adeea8]
[head:04225] [22]
/opt/plato/release_1.0/bin/engine(__gxx_personality_v0+0x1ad) [0x84cbe31]
[head:04225] *** End of error message ***
==4225==
==4225== Process terminating with default
action of signal 11 (SIGSEGV)
==4225== Access not within mapped
region at address 0x1
==4225== at 0x401ECD0:
memcpy (mc_replace_strmem.c:406)
==4225== by 0x589216E:
ompi_generic_simple_pack (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225== by 0x5895F1A:
ompi_convertor_pack (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225== by 0x6911A07:
mca_btl_tcp_prepare_src (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x68F9FDB:
mca_pml_ob1_send_request_schedule_exclusive (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x68FB717:
mca_pml_ob1_frag_completion (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x6915905:
mca_btl_tcp_endpoint_send_handler (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)
==4225== by 0x5941710:
opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x59418F8:
opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x593BFFD:
opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)
==4225== by 0x68F4384:
mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)
==4225== by 0x58B3CCF:
PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)
==4225==
==4225== ERROR SUMMARY: 239 errors from 43
contexts (suppressed: 63 from 1)
==4225== malloc/free: in use at exit:
51,796,553 bytes in 138,247 blocks.
==4225== malloc/free: 6,837,039 allocs,
6,698,792 frees, 293,385,873 bytes allocated.
==4225== For counts of detected errors,
rerun with: -v
==4225== searching for pointers to 138,247
not-freed blocks.
==4225== checked 164,732,924 bytes.
==4225==
==4225== LEAK SUMMARY:
==4225== definitely lost:
4,455 bytes in 249 blocks.
==4225==
possibly lost: 226,109 bytes in 5,823 blocks.
==4225== still reachable:
51,565,989 bytes in 132,175 blocks.
==4225==
suppressed: 0 bytes in 0 blocks.
==4225== Use --leak-check=full to see
details of leaked memory.
==4226==
==4226== ERROR SUMMARY: 227 errors from 41
contexts (suppressed: 63 from 1)
==4226== malloc/free: in use at exit:
50,764,389 bytes in 138,250 blocks.
==4226== malloc/free: 143,586 allocs, 5,336
frees, 103,910,477 bytes allocated.
==4226== For counts of detected errors,
rerun with: -v
==4227==
==4227== ERROR SUMMARY: 225 errors from 40
contexts (suppressed: 63 from 1)
==4227== malloc/free: in use at exit:
515,706 bytes in 11,582 blocks.
==4227== malloc/free: 16,868 allocs, 5,286
frees, 4,054,035 bytes allocated.
==4227== For counts of detected errors,
rerun with: -v
==4227== searching for pointers to 11,582
not-freed blocks.
==4227== checked 7,295,768 bytes.
==4227==
==4227== LEAK SUMMARY:
==4227== definitely lost:
4,455 bytes in 249 blocks.
==4227==
possibly lost: 226,109 bytes in 5,823 blocks.
==4227== still reachable:
285,142 bytes in 5,510 blocks.
==4227==
suppressed: 0 bytes in 0 blocks.