Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Gauckler (michael_at_[hidden])
Date: 2007-04-18 08:27:46


Hi George,

 

Some more investigation on the Segmentation fault done with valgrind is
shown below.

There seems to be uninitialized parameters and finally a read at address
0x1, which

causes the segfault. I have checked whether one of my members appear to be
at that
address when constructing the MPI_Datatype, this is not the case.

 

Maybe you can draw a conclusion from the information and give me a hint on
where

to search futher.

 

Thanks!

  Michael

 

==4225==

==4225== Syscall param writev(vector[...]) points to uninitialised byte(s)

==4225== at 0x4000792: (within /lib/ld-2.3.6.so)

==4225== by 0x6915CE5: mca_btl_tcp_frag_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x6915923: mca_btl_tcp_endpoint_send_handler (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x5941710: opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x59418F8: opal_event_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x593BFFD: opal_progress (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x68F4384: mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x58B3CCF: PMPI_Send (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225== by 0x8549D6C: GP::MPIProcessGroup::send(GP::cstring const&, int,
int) (gpProcessGroup.cpp:326)

==4225== by 0x8549A32:
GP::MPIProcessGroup::send(boost::shared_ptr<GP::Entity>, int, int)
(gpProcessGroup.cpp:263)

==4225== by 0x88286B1:
GP::ParallelDataAccessor::load(boost::shared_ptr<GP::Entity>)
(gpParallelDataAccessor.cpp:105)

==4225== by 0x86F7A5A: GP::Transactions::create(xercesc_2_7::DOMNode
const*) (gpTransaction.cpp:993)

==4225== Address 0x8AD6970 is not stack'd, malloc'd or (recently) free'd

MPIProcessGroup::send(const MemoryMap ..) method begin.

MPIProcessGroup::send(const MemoryMap ..) calling MPI_Send with 12159464
bytes.

==4225==

==4225== Syscall param writev(vector[...]) points to uninitialised byte(s)

==4225== at 0x4000792: (within /lib/ld-2.3.6.so)

==4225== by 0x6915CE5: mca_btl_tcp_frag_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x6914CB3: mca_btl_tcp_endpoint_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x6912287: mca_btl_tcp_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x68FA38D: mca_pml_ob1_send_request_start_rndv (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x68F426F: mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x58B3CCF: PMPI_Send (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225== by 0x8549C05: GP::MPIProcessGroup::send(GP::MemoryMap const&,
int, int) (gpProcessGroup.cpp:296)

==4225== by 0x8549AA0:
GP::MPIProcessGroup::send(boost::shared_ptr<GP::Entity>, int, int)
(gpProcessGroup.cpp:271)

==4225== by 0x88286B1:
GP::ParallelDataAccessor::load(boost::shared_ptr<GP::Entity>)
(gpParallelDataAccessor.cpp:105)

==4225== by 0x86F7A5A: GP::Transactions::create(xercesc_2_7::DOMNode
const*) (gpTransaction.cpp:993)

==4225== by 0x85461B3: GP::Factory<boost::shared_ptr<GP::XmlBase>,
std::string, boost::shared_ptr<GP::XmlBase> (*)(xercesc_2_7::DOMNode
const*), GP::DefaultFactoryError>::createObject(xercesc_2_7::DOMNode const*)
(gpXmlFactoryParser.cpp:46)

==4225== Address 0x8AD6970 is not stack'd, malloc'd or (recently) free'd

==4226==

==4226== Syscall param writev(vector[...]) points to uninitialised byte(s)

==4226== at 0x4000792: (within /lib/ld-2.3.6.so)

==4226== by 0x6915CE5: mca_btl_tcp_frag_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226== by 0x6914CB3: mca_btl_tcp_endpoint_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226== by 0x6912287: mca_btl_tcp_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226== by 0x68F6C02: mca_pml_ob1_recv_request_ack_send_btl (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226== by 0x68F7CB6: mca_pml_ob1_recv_request_progress (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226== by 0x68F5D77: mca_pml_ob1_recv_frag_match (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226== by 0x6915178: mca_btl_tcp_endpoint_recv_handler (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226== by 0x5941710: opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4226== by 0x59418F8: opal_event_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4226== by 0x593BFFD: opal_progress (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4226== by 0x68F3084: mca_pml_ob1_recv (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226== Address 0x8AD6970 is not stack'd, malloc'd or (recently) free'd

==4225==

==4225== Invalid read of size 1

==4225== at 0x401ECD0: memcpy (mc_replace_strmem.c:406)

==4225== by 0x589216E: ompi_generic_simple_pack (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225== by 0x5895F1A: ompi_convertor_pack (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225== by 0x6911A07: mca_btl_tcp_prepare_src (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x68F9FDB: mca_pml_ob1_send_request_schedule_exclusive (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x68FB717: mca_pml_ob1_frag_completion (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x6915905: mca_btl_tcp_endpoint_send_handler (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x5941710: opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x59418F8: opal_event_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x593BFFD: opal_progress (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x68F4384: mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x58B3CCF: PMPI_Send (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225== Address 0x1 is not stack'd, malloc'd or (recently) free'd

[head:04225] *** Process received signal ***

[head:04225] Signal: Segmentation fault (11)

[head:04225] Signal code: Address not mapped (1)

[head:04225] Failing at address: 0x1

[head:04225] [ 0] /lib/tls/i686/cmov/libpthread.so.0 [0x5ac3bd0]

[head:04225] [ 1]
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(ompi_generic_simple_pack+0x
2ff) [0x589216f]

[head:04225] [ 2]
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(ompi_convertor_pack+0x12b)
[0x5895f1b]

[head:04225] [ 3]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_prep
are_src+0x198) [0x6911a08]

[head:04225] [ 4]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send
_request_schedule_exclusive+0x18c) [0x68f9fdc]

[head:04225] [ 5]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so [0x68fb718]

[head:04225] [ 6]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so [0x6915906]

[head:04225] [ 7]
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_event_base_loop+0
x391) [0x5941711]

[head:04225] [ 8]
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_event_loop+0x29)
[0x59418f9]

[head:04225] [ 9]
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_progress+0xbe)
[0x593bffe]

[head:04225] [10]
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send
+0x5d5) [0x68f4385]

[head:04225] [11]
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(MPI_Send+0x170) [0x58b3cd0]

[head:04225] [12]
/opt/plato/release_1.0/bin/engine(_ZN2GP15MPIProcessGroup4sendERKNS_9MemoryM
apEii+0xee) [0x8549c06]

[head:04225] [13]
/opt/plato/release_1.0/bin/engine(_ZN2GP15MPIProcessGroup4sendEN5boost10shar
ed_ptrINS_6EntityEEEii+0x165) [0x8549aa1]

[head:04225] [14]
/opt/plato/release_1.0/bin/engine(_ZN2GP20ParallelDataAccessor4loadEN5boost1
0shared_ptrINS_6EntityEEE+0x15c) [0x88286b2]

[head:04225] [15]
/opt/plato/release_1.0/bin/engine(_ZN2GP12Transactions6createEPKN11xercesc_2
_77DOMNodeE+0x361) [0x86f7a5b]

[head:04225] [16]
/opt/plato/release_1.0/bin/engine(_ZN2GP7FactoryIN5boost10shared_ptrINS_7Xml
BaseEEESsPFS4_PKN11xercesc_2_77DOMNodeEENS_19DefaultFactoryErrorEE12createOb
jectES8_+0xde) [0x85461b4]

[head:04225] [17]
/opt/plato/release_1.0/bin/engine(_ZN2GP16XmlFactoryParser7descentEPN11xerce
sc_2_77DOMNodeEb+0x5c9) [0x85469d7]

[head:04225] [18]
/opt/plato/release_1.0/bin/engine(_ZN2GP9XmlParser8traverseEb+0x1d8)
[0x8542c90]

[head:04225] [19]
/opt/plato/release_1.0/bin/engine(_ZN2GP16XmlFactoryParser8traverseEb+0x1f)
[0x85463ff]

[head:04225] [20] /opt/plato/release_1.0/bin/engine(main+0x1566) [0x84d02b0]

[head:04225] [21] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8)
[0x5adeea8]

[head:04225] [22]
/opt/plato/release_1.0/bin/engine(__gxx_personality_v0+0x1ad) [0x84cbe31]

[head:04225] *** End of error message ***

==4225==

==4225== Process terminating with default action of signal 11 (SIGSEGV)

==4225== Access not within mapped region at address 0x1

==4225== at 0x401ECD0: memcpy (mc_replace_strmem.c:406)

==4225== by 0x589216E: ompi_generic_simple_pack (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225== by 0x5895F1A: ompi_convertor_pack (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225== by 0x6911A07: mca_btl_tcp_prepare_src (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x68F9FDB: mca_pml_ob1_send_request_schedule_exclusive (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x68FB717: mca_pml_ob1_frag_completion (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x6915905: mca_btl_tcp_endpoint_send_handler (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225== by 0x5941710: opal_event_base_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x59418F8: opal_event_loop (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x593BFFD: opal_progress (in
/home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225== by 0x68F4384: mca_pml_ob1_send (in
/home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225== by 0x58B3CCF: PMPI_Send (in
/home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==

==4225== ERROR SUMMARY: 239 errors from 43 contexts (suppressed: 63 from 1)

==4225== malloc/free: in use at exit: 51,796,553 bytes in 138,247 blocks.

==4225== malloc/free: 6,837,039 allocs, 6,698,792 frees, 293,385,873 bytes
allocated.

==4225== For counts of detected errors, rerun with: -v

==4225== searching for pointers to 138,247 not-freed blocks.

==4225== checked 164,732,924 bytes.

==4225==

==4225== LEAK SUMMARY:

==4225== definitely lost: 4,455 bytes in 249 blocks.

==4225== possibly lost: 226,109 bytes in 5,823 blocks.

==4225== still reachable: 51,565,989 bytes in 132,175 blocks.

==4225== suppressed: 0 bytes in 0 blocks.

==4225== Use --leak-check=full to see details of leaked memory.

==4226==

==4226== ERROR SUMMARY: 227 errors from 41 contexts (suppressed: 63 from 1)

==4226== malloc/free: in use at exit: 50,764,389 bytes in 138,250 blocks.

==4226== malloc/free: 143,586 allocs, 5,336 frees, 103,910,477 bytes
allocated.

==4226== For counts of detected errors, rerun with: -v

==4227==

==4227== ERROR SUMMARY: 225 errors from 40 contexts (suppressed: 63 from 1)

==4227== malloc/free: in use at exit: 515,706 bytes in 11,582 blocks.

==4227== malloc/free: 16,868 allocs, 5,286 frees, 4,054,035 bytes allocated.

==4227== For counts of detected errors, rerun with: -v

==4227== searching for pointers to 11,582 not-freed blocks.

==4227== checked 7,295,768 bytes.

==4227==

==4227== LEAK SUMMARY:

==4227== definitely lost: 4,455 bytes in 249 blocks.

==4227== possibly lost: 226,109 bytes in 5,823 blocks.

==4227== still reachable: 285,142 bytes in 5,510 blocks.

==4227== suppressed: 0 bytes in 0 blocks.