Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bus Error in ompi_free_list_grow
From: Geraldo Veiga (gveiga+openmpi_at_[hidden])
Date: 2008-11-13 10:16:34

Hi to all,

I am using the same subject of a recent message I found in the list archives
of this mailing list:

There was no follow-up on that one, but will add this similar report in
case a list member can give us an idea of how to correct it. Or whose bug
this could be.

My application behaves as expected when I run it in a single host and
multiple MPI nodes of our SGI Altix ICE 8200 cluster with in
InfiniBand. When I try the same with multiple hosts, using the PBS batch
system the program terminates with a segmentation fault:

[r1i0n9:09192] *** Process received signal ***
[r1i0n9:09192] Signal: Bus error (7)
[r1i0n9:09192] Signal code: (2)
[r1i0n9:09192] Failing at address: 0x2b67ca0c8c20
[r1i0n9:09192] [ 0] /lib64/ [0x2b67bfdb1c00]
[r1i0n9:09192] [ 1]
[r1i0n9:09192] [ 2]
[r1i0n9:09192] [ 3]
[r1i0n9:09192] [ 4]
[r1i0n9:09192] [ 5] /sw/openmpi_intel/1.2.8/lib/
[r1i0n9:09192] [ 6]
[r1i0n9:09192] [ 7] dsimpletest(dmumps_comm_buffer_mp_dmumps_519_+0x449)
[r1i0n9:09192] [ 8] dsimpletest(dmumps_load_mp_dmumps_512_+0x20b) [0x54fda1]
[r1i0n9:09192] [ 9] dsimpletest(dmumps_251_+0x4995) [0x4d273b]
[r1i0n9:09192] [10] dsimpletest(dmumps_244_+0x808) [0x484e38]
[r1i0n9:09192] [11] dsimpletest(dmumps_142_+0x8717) [0x4bf5eb]
[r1i0n9:09192] [12] dsimpletest(dmumps_+0x1554) [0x43a720]
[r1i0n9:09192] [13] dsimpletest(MAIN__+0x50b) [0x41e4c3]
[r1i0n9:09192] [14] dsimpletest(main+0x3c) [0x683d4c]
[r1i0n9:09192] [15] /lib64/
[r1i0n9:09192] [16] dsimpletest(dtrmv_+0xa1) [0x41df29]
[r1i0n9:09192] *** End of error message ***

Most of the software infrastructure is provided by the Intel propack.  Any
hints of where to look further into this bug?
Thanks in advance.
Geraldo Veiga <gveiga_at_[hidden]>