Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi 1.5.4 paffinity with Magny-Cours
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-09-09 15:15:17


Le 09/09/2011 21:03, Kaizaad Bilimorya a écrit :
>
> We seem to have an issue similar to this thread
>
> "Bug in openmpi 1.5.4 in paffinity"
> http://www.open-mpi.org/community/lists/users/2011/09/17151.php
>
> Using the following version of hwloc (from EPEL repo - we run CentOS 5.6)
>
> $ hwloc-info --version
> hwloc-info 1.1rc6

Hello,

Note that Open MPI 1.5.4 uses its own embedded copy of hwloc 1.2.0.

Your own 1.1rc6 should actual work fine (does lstopo crash?) but OMPI
cannot use it :)

> A simple "mpi_hello" program works fine with cpusets and openMPI 1.4.2
> but with openMPI 1.5.3 and cpusets we get the following segfault
> (works fine on the node without cpusets enabled):
>
> [red2:28263] *** Process received signal ***
> [red2:28263] Signal: Segmentation fault (11)
> [red2:28263] Signal code: Address not mapped (1)
> [red2:28263] Failing at address: 0x8
> [red2:28263] [ 0] /lib64/libpthread.so.0 [0x2b3dce315b10]
> [red2:28263] [ 1]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_bitmap_or+0x142)
> [0x2b3dcef75cb2]
> [red2:28263] [ 2]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so
> [0x2b3dcef71404]
> [red2:28263] [ 3]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so
> [0x2b3dcef6bb26]
> [red2:28263] [ 4]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_topology_load+0xe2)
> [0x2b3dcef6e0b2]
> [red2:28263] [ 5]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so
> [0x2b3dcef68b72]
> [red2:28263] [ 6]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(mca_base_components_open+0x302)
> [0x2b3dcd2b08f2]
> [red2:28263] [ 7]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_paffinity_base_open+0x67)
> [0x2b3dcd2d3a87]
> [red2:28263] [ 8]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_init+0x71)
> [0x2b3dcd28bfb1]
> [red2:28263] [ 9]
> /opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(orte_init+0x23)
> [0x2b3dcd2318f3]
> [red2:28263] [10] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4049b5]
> [red2:28263] [11] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x404388]
> [red2:28263] [12] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x2b3dce540994]
> [red2:28263] [13] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4042b9]
> [red2:28263] *** End of error message ***
> /var/spool/torque/mom_priv/jobs/968.SC: line 3: 28263 Segmentation
> fault /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun -np 2 ./a.out
>
> Please let me know if you need more information about this issue

This looks like the exact same issue. Did you try the patch(es) I sent
earlier?
See http://www.open-mpi.org/community/lists/users/2011/09/17159.php
If it's not enough, try adding the other patch from
http://www.open-mpi.org/community/lists/users/2011/09/17156.php

Brice