Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 ...
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-06-10 10:57:08


That error would indicate something wrong with the pbs connection - it is tm_init that is crashing. I note that you did --with-tm pointing to a different location - was that intentional? Could be something wrong with that pbs build

On Jun 10, 2010, at 8:44 AM, Richard Walsh wrote:

>
> All,
>
> I am upgrading from 1.4.1 to 1.4.2 on both a cluster with IB and one without.
> I have no problem on the GE cluster without IB which requires no special configure
> options for the IB. 1.4.2 works perfectly there with both the latest Intel and PGI
> compiler.
>
> On the IB system 1.4.1 has worked fine with the following configure line:
>
> ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm --with-openib --prefix=/share/apps/openmpi-intel/1.4.1 --with-tm=/share/apps/pbs/10.1.0.91350
>
> I have now built 1.4.2. with the almost identical:
>
> $ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm --with-openib --prefix=/share/apps/openmpi-intel/1.4.2 --with-tm=/share/apps/pbs/default
>
> When I run a basic MPI test program with:
>
> /share/apps/openmpi-intel/1.4.2/bin/mpirun -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe
>
> which defaults to using the IB switch, or with:
>
> /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe
>
> which forces the use of GE, I get the same error:
>
> [compute-0-3:22515] *** Process received signal ***
> [compute-0-3:22515] Signal: Segmentation fault (11)
> [compute-0-3:22515] Signal code: Address not mapped (1)
> [compute-0-3:22515] Failing at address: 0x3f
> [compute-0-3:22515] [ 0] /lib64/libpthread.so.0 [0x3639e0e7c0]
> [compute-0-3:22515] [ 1] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(discui_+0x84) [0x2b7b546dd3d0]
> [compute-0-3:22515] [ 2] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(diswsi+0xc3) [0x2b7b546da9e3]
> [compute-0-3:22515] [ 3] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d868c]
> [compute-0-3:22515] [ 4] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(tm_init+0x1fe) [0x2b7b546d8978]
> [compute-0-3:22515] [ 5] /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d791c]
> [compute-0-3:22515] [ 6] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x404c27]
> [compute-0-3:22515] [ 7] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403e38]
> [compute-0-3:22515] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x363961d994]
> [compute-0-3:22515] [ 9] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403d69]
> [compute-0-3:22515] *** End of error message ***
> /var/spool/PBS/mom_priv/jobs/9909.bob.csi.cuny.edu.SC: line 42: 22515 Segmentation fault /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe
>
> When compiling with the PGI compiler suite I get the same result
> although the traceback gives less detail. I notice postings that suggest
> the if I disable the memory-manager I might be able to get around
> this problem, but that will result in a performance hit on this IB
> system.
>
> Have others seen this? Suggestions?
>
> Thanks,
>
> Richard Walsh
> CUNY HPC Center
>
> Richard Walsh
> Parallel Applications and Systems Manager
> CUNY HPC Center, Staten Island, NY
> 718-982-3319
> 612-382-4620
>
> Mighty the Wizard
> Who found me at sunrise
> Sleeping, and woke me
> And learn'd me Magic!
>
> Think green before you print this email.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users