Hi George,

 

Some more investigation on the Segmentation fault done with valgrind is shown below.

There seems to be uninitialized parameters and finally a read at address 0x1, which

causes the segfault. I have checked whether one of my members appear to be at that
address when constructing the MPI_Datatype, this is not the case.

 

Maybe you can draw a conclusion from the information and give me a hint on where

to search futher.

 

Thanks!

  Michael

 

==4225==

==4225== Syscall param writev(vector[...]) points to uninitialised byte(s)

==4225==    at 0x4000792: (within /lib/ld-2.3.6.so)

==4225==    by 0x6915CE5: mca_btl_tcp_frag_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x6915923: mca_btl_tcp_endpoint_send_handler (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x5941710: opal_event_base_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x59418F8: opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x593BFFD: opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x68F4384: mca_pml_ob1_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x58B3CCF: PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==    by 0x8549D6C: GP::MPIProcessGroup::send(GP::cstring const&, int, int) (gpProcessGroup.cpp:326)

==4225==    by 0x8549A32: GP::MPIProcessGroup::send(boost::shared_ptr<GP::Entity>, int, int) (gpProcessGroup.cpp:263)

==4225==    by 0x88286B1: GP::ParallelDataAccessor::load(boost::shared_ptr<GP::Entity>) (gpParallelDataAccessor.cpp:105)

==4225==    by 0x86F7A5A: GP::Transactions::create(xercesc_2_7::DOMNode const*) (gpTransaction.cpp:993)

==4225==  Address 0x8AD6970 is not stack'd, malloc'd or (recently) free'd

MPIProcessGroup::send(const MemoryMap ..) method begin.

MPIProcessGroup::send(const MemoryMap ..) calling MPI_Send with 12159464 bytes.

==4225==

==4225== Syscall param writev(vector[...]) points to uninitialised byte(s)

==4225==    at 0x4000792: (within /lib/ld-2.3.6.so)

==4225==    by 0x6915CE5: mca_btl_tcp_frag_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x6914CB3: mca_btl_tcp_endpoint_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x6912287: mca_btl_tcp_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x68FA38D: mca_pml_ob1_send_request_start_rndv (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x68F426F: mca_pml_ob1_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x58B3CCF: PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==    by 0x8549C05: GP::MPIProcessGroup::send(GP::MemoryMap const&, int, int) (gpProcessGroup.cpp:296)

==4225==    by 0x8549AA0: GP::MPIProcessGroup::send(boost::shared_ptr<GP::Entity>, int, int) (gpProcessGroup.cpp:271)

==4225==    by 0x88286B1: GP::ParallelDataAccessor::load(boost::shared_ptr<GP::Entity>) (gpParallelDataAccessor.cpp:105)

==4225==    by 0x86F7A5A: GP::Transactions::create(xercesc_2_7::DOMNode const*) (gpTransaction.cpp:993)

==4225==    by 0x85461B3: GP::Factory<boost::shared_ptr<GP::XmlBase>, std::string, boost::shared_ptr<GP::XmlBase> (*)(xercesc_2_7::DOMNode const*), GP::DefaultFactoryError>::createObject(xercesc_2_7::DOMNode const*) (gpXmlFactoryParser.cpp:46)

==4225==  Address 0x8AD6970 is not stack'd, malloc'd or (recently) free'd

==4226==

==4226== Syscall param writev(vector[...]) points to uninitialised byte(s)

==4226==    at 0x4000792: (within /lib/ld-2.3.6.so)

==4226==    by 0x6915CE5: mca_btl_tcp_frag_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226==    by 0x6914CB3: mca_btl_tcp_endpoint_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226==    by 0x6912287: mca_btl_tcp_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226==    by 0x68F6C02: mca_pml_ob1_recv_request_ack_send_btl (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226==    by 0x68F7CB6: mca_pml_ob1_recv_request_progress (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226==    by 0x68F5D77: mca_pml_ob1_recv_frag_match (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226==    by 0x6915178: mca_btl_tcp_endpoint_recv_handler (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4226==    by 0x5941710: opal_event_base_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4226==    by 0x59418F8: opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4226==    by 0x593BFFD: opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4226==    by 0x68F3084: mca_pml_ob1_recv (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4226==  Address 0x8AD6970 is not stack'd, malloc'd or (recently) free'd

==4225==

==4225== Invalid read of size 1

==4225==    at 0x401ECD0: memcpy (mc_replace_strmem.c:406)

==4225==    by 0x589216E: ompi_generic_simple_pack (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==    by 0x5895F1A: ompi_convertor_pack (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==    by 0x6911A07: mca_btl_tcp_prepare_src (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x68F9FDB: mca_pml_ob1_send_request_schedule_exclusive (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x68FB717: mca_pml_ob1_frag_completion (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x6915905: mca_btl_tcp_endpoint_send_handler (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x5941710: opal_event_base_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x59418F8: opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x593BFFD: opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x68F4384: mca_pml_ob1_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x58B3CCF: PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==  Address 0x1 is not stack'd, malloc'd or (recently) free'd

[head:04225] *** Process received signal ***

[head:04225] Signal: Segmentation fault (11)

[head:04225] Signal code: Address not mapped (1)

[head:04225] Failing at address: 0x1

[head:04225] [ 0] /lib/tls/i686/cmov/libpthread.so.0 [0x5ac3bd0]

[head:04225] [ 1] /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(ompi_generic_simple_pack+0x2ff) [0x589216f]

[head:04225] [ 2] /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(ompi_convertor_pack+0x12b) [0x5895f1b]

[head:04225] [ 3] /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_prepare_src+0x198) [0x6911a08]

[head:04225] [ 4] /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_exclusive+0x18c) [0x68f9fdc]

[head:04225] [ 5] /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so [0x68fb718]

[head:04225] [ 6] /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so [0x6915906]

[head:04225] [ 7] /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_event_base_loop+0x391) [0x5941711]

[head:04225] [ 8] /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_event_loop+0x29) [0x59418f9]

[head:04225] [ 9] /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0(opal_progress+0xbe) [0x593bffe]

[head:04225] [10] /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x5d5) [0x68f4385]

[head:04225] [11] /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0(MPI_Send+0x170) [0x58b3cd0]

[head:04225] [12] /opt/plato/release_1.0/bin/engine(_ZN2GP15MPIProcessGroup4sendERKNS_9MemoryMapEii+0xee) [0x8549c06]

[head:04225] [13] /opt/plato/release_1.0/bin/engine(_ZN2GP15MPIProcessGroup4sendEN5boost10shared_ptrINS_6EntityEEEii+0x165) [0x8549aa1]

[head:04225] [14] /opt/plato/release_1.0/bin/engine(_ZN2GP20ParallelDataAccessor4loadEN5boost10shared_ptrINS_6EntityEEE+0x15c) [0x88286b2]

[head:04225] [15] /opt/plato/release_1.0/bin/engine(_ZN2GP12Transactions6createEPKN11xercesc_2_77DOMNodeE+0x361) [0x86f7a5b]

[head:04225] [16] /opt/plato/release_1.0/bin/engine(_ZN2GP7FactoryIN5boost10shared_ptrINS_7XmlBaseEEESsPFS4_PKN11xercesc_2_77DOMNodeEENS_19DefaultFactoryErrorEE12createObjectES8_+0xde) [0x85461b4]

[head:04225] [17] /opt/plato/release_1.0/bin/engine(_ZN2GP16XmlFactoryParser7descentEPN11xercesc_2_77DOMNodeEb+0x5c9) [0x85469d7]

[head:04225] [18] /opt/plato/release_1.0/bin/engine(_ZN2GP9XmlParser8traverseEb+0x1d8) [0x8542c90]

[head:04225] [19] /opt/plato/release_1.0/bin/engine(_ZN2GP16XmlFactoryParser8traverseEb+0x1f) [0x85463ff]

[head:04225] [20] /opt/plato/release_1.0/bin/engine(main+0x1566) [0x84d02b0]

[head:04225] [21] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0x5adeea8]

[head:04225] [22] /opt/plato/release_1.0/bin/engine(__gxx_personality_v0+0x1ad) [0x84cbe31]

[head:04225] *** End of error message ***

==4225==

==4225== Process terminating with default action of signal 11 (SIGSEGV)

==4225==  Access not within mapped region at address 0x1

==4225==    at 0x401ECD0: memcpy (mc_replace_strmem.c:406)

==4225==    by 0x589216E: ompi_generic_simple_pack (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==    by 0x5895F1A: ompi_convertor_pack (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==    by 0x6911A07: mca_btl_tcp_prepare_src (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x68F9FDB: mca_pml_ob1_send_request_schedule_exclusive (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x68FB717: mca_pml_ob1_frag_completion (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x6915905: mca_btl_tcp_endpoint_send_handler (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_btl_tcp.so)

==4225==    by 0x5941710: opal_event_base_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x59418F8: opal_event_loop (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x593BFFD: opal_progress (in /home/mig/openmpi-1.2.1a-install/lib/libopen-pal.so.0.0.0)

==4225==    by 0x68F4384: mca_pml_ob1_send (in /home/mig/openmpi-1.2.1a-install/lib/openmpi/mca_pml_ob1.so)

==4225==    by 0x58B3CCF: PMPI_Send (in /home/mig/openmpi-1.2.1a-install/lib/libmpi.so.0.0.0)

==4225==

==4225== ERROR SUMMARY: 239 errors from 43 contexts (suppressed: 63 from 1)

==4225== malloc/free: in use at exit: 51,796,553 bytes in 138,247 blocks.

==4225== malloc/free: 6,837,039 allocs, 6,698,792 frees, 293,385,873 bytes allocated.

==4225== For counts of detected errors, rerun with: -v

==4225== searching for pointers to 138,247 not-freed blocks.

==4225== checked 164,732,924 bytes.

==4225==

==4225== LEAK SUMMARY:

==4225==    definitely lost: 4,455 bytes in 249 blocks.

==4225==      possibly lost: 226,109 bytes in 5,823 blocks.

==4225==    still reachable: 51,565,989 bytes in 132,175 blocks.

==4225==         suppressed: 0 bytes in 0 blocks.

==4225== Use --leak-check=full to see details of leaked memory.

==4226==

==4226== ERROR SUMMARY: 227 errors from 41 contexts (suppressed: 63 from 1)

==4226== malloc/free: in use at exit: 50,764,389 bytes in 138,250 blocks.

==4226== malloc/free: 143,586 allocs, 5,336 frees, 103,910,477 bytes allocated.

==4226== For counts of detected errors, rerun with: -v

==4227==

==4227== ERROR SUMMARY: 225 errors from 40 contexts (suppressed: 63 from 1)

==4227== malloc/free: in use at exit: 515,706 bytes in 11,582 blocks.

==4227== malloc/free: 16,868 allocs, 5,286 frees, 4,054,035 bytes allocated.

==4227== For counts of detected errors, rerun with: -v

==4227== searching for pointers to 11,582 not-freed blocks.

==4227== checked 7,295,768 bytes.

==4227==

==4227== LEAK SUMMARY:

==4227==    definitely lost: 4,455 bytes in 249 blocks.

==4227==      possibly lost: 226,109 bytes in 5,823 blocks.

==4227==    still reachable: 285,142 bytes in 5,510 blocks.

==4227==         suppressed: 0 bytes in 0 blocks.