Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] openmpi 1.5.4 paffinity with Magny-Cours
From: Kaizaad Bilimorya (kaizaad_at_[hidden])
Date: 2011-09-09 15:03:44


We seem to have an issue similar to this thread

"Bug in openmpi 1.5.4 in paffinity"
http://www.open-mpi.org/community/lists/users/2011/09/17151.php

Using the following version of hwloc (from EPEL repo - we run CentOS 5.6)

$ hwloc-info --version
hwloc-info 1.1rc6

A simple "mpi_hello" program works fine with cpusets and openMPI 1.4.2 but
with openMPI 1.5.3 and cpusets we get the following segfault (works fine
on the node without cpusets enabled):

[red2:28263] *** Process received signal ***
[red2:28263] Signal: Segmentation fault (11)
[red2:28263] Signal code: Address not mapped (1)
[red2:28263] Failing at address: 0x8
[red2:28263] [ 0] /lib64/libpthread.so.0 [0x2b3dce315b10]
[red2:28263] [ 1] /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_bitmap_or+0x142) [0x2b3dcef75cb2]
[red2:28263] [ 2] /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so [0x2b3dcef71404]
[red2:28263] [ 3] /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so [0x2b3dcef6bb26]
[red2:28263] [ 4] /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_topology_load+0xe2) [0x2b3dcef6e0b2]
[red2:28263] [ 5] /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so [0x2b3dcef68b72]
[red2:28263] [ 6] /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(mca_base_components_open+0x302) [0x2b3dcd2b08f2]
[red2:28263] [ 7] /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_paffinity_base_open+0x67) [0x2b3dcd2d3a87]
[red2:28263] [ 8] /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_init+0x71) [0x2b3dcd28bfb1]
[red2:28263] [ 9] /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(orte_init+0x23) [0x2b3dcd2318f3]
[red2:28263] [10] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4049b5]
[red2:28263] [11] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x404388]
[red2:28263] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b3dce540994]
[red2:28263] [13] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4042b9]
[red2:28263] *** End of error message ***
/var/spool/torque/mom_priv/jobs/968.SC: line 3: 28263 Segmentation fault
/opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun -np 2 ./a.out

Please let me know if you need more information about this issue

thanks
-k