Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Seg fault with PBS Pro 10.2
From: Repsher, Stephen J (stephen.j.repsher_at_[hidden])
Date: 2010-02-12 10:50:14


Hello,

I'm having problems running Open MPI jobs under PBS Pro 10.2. I've configured and built OpenMPI 1.4.1 with the Intel 11.1 compiler on Linux and with --with-tm support and the build runs fine. I've also built with static libraries per the FAQ suggestion since libpbs is static. However, my test application keep failing with a segmentation fault, but ONLY when trying to select more than 1 node. Running on a single node withing PBS works fine. Also, running outside of PBS vis ssh runs fine as well, even across multiple nodes. OpenIB support is also enabled, but that doesn't seem to affect the error because I've also tried running with the --mca btl tcp,self flag and it still doesn't work. Here is the error I'm getting:

[n34:26892] *** Process received signal ***
[n34:26892] Signal: Segmentation fault (11)
[n34:26892] Signal code: Address not mapped (1)
[n34:26892] Failing at address: 0x3f
[n34:26892] [ 0] /lib64/libpthread.so.0 [0x7fc0309d6a90]
[n34:26892] [ 1] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun(discui_+0x84) [0x476a50]
[n34:26892] [ 2] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun(diswsi+0xc3) [0x474063]
[n34:26892] [ 3] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x471d0c]
[n34:26892] [ 4] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun(tm_init+0x1fe) [0x471ff8]
[n34:26892] [ 5] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x43f580]
[n34:26892] [ 6] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x413921]
[n34:26892] [ 7] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x412b78]
[n34:26892] [ 8] /lib64/libc.so.6(__libc_start_main+0xe6) [0x7fc03068d586]
[n34:26892] [ 9] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x412ac9]
[n34:26892] *** End of error message ***
Segmentation fault

(NOTE: pbs_mpirun = orterun, mpirun, etc.)

Has anyone else seen errors like this within PBS?

============================================
Steve Repsher
Boeing Defense, Space, & Security - Rotorcraft
Aerodynamics/CFD
Phone: (610) 591-1510
Fax: (610) 591-6263
stephen.j.repsher_at_[hidden]