Dear All,
I recently installed 1.4.2 version, and am having a problem specific to this version only (or so it seems). Before I lay out the details please note that I am building 1.4.2 *exactly* the same as I built 1.4.1: same compiler options, same OpenIB and other system libraries, same configure options, and same everything. Version 1.4.1 doesn't have this issue
The error message is the following:
$ mpirun -pernode ./hello
[n90:21674] *** Process received signal ***
[n90:21674] Signal: Segmentation fault (11)
[n90:21674] Signal code: Address not mapped (1)
[n90:21674] Failing at address: 0x50
[n90:21674] [ 0] /lib64/libpthread.so.0 [0x3654a0e4c0]
[n90:21674] [ 1] /opt/fermi/openmpi/1.4.2/intel/fast/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7) [0x2b6b2f299b87]
[n90:21674] [ 2] /opt/fermi/openmpi/1.4.2/intel/fast/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3ce) [0x2b6b2f2baefe]
[n90:21674] [ 3] /opt/fermi/openmpi/1.4.2/intel/fast/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd5) [0x2b6b2f2ce1e5]
[n90:21674] [ 4] /opt/fermi/openmpi/1.4.2/intel/fast/lib/libopen-rte.so.0 [0x2b6b2f2d17ee]
[n90:21674] [ 5] mpirun [0x404cff]
[n90:21674] [ 6] mpirun [0x403e48]
[n90:21674] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3653e1d974]
[n90:21674] [ 8] mpirun [0x403d79]
[n90:21674] *** End of error message ***
Segmentation fault
[n74:21733] [[41942,0],1] routed:binomial: Connection to lifeline [[41942,0],0] lost
This last line is from mpirun, not the executable. The executable is a simple hello world. All is well without the -pernode flag. All is well even when there is only one process per node (say, if so allocated over PBS) and -pernode flag is not used.
Attached are what is asked herein:
http://www.open-mpi.org/community/help/ except the Infiniband specific details. I'll be happy to provide if that is necessary, but note that the failure is the same if I used -mca btl self,tcp
Thank you,
Levent