Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line
From: Michael E. Thomadakis (miket7777_at_[hidden])
Date: 2010-08-23 20:20:27


  Hello OMPI:

We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4.
OMPI was built uisng Intel compilers 11.1.072. I am attaching the
configuration log and output from ompi_info -a.

The problem we are encountering is that whenever we use option
'-npernode N' in the mpirun command line we get a segmentation fault as
in below:

miket_at_login002[pts/7]PS $ mpirun -npernode 1 --display-devel-map
--tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname

  Map generated by mapping policy: 0402
         Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE
         Num new daemons: 2 New daemon starting vpid 1
         Num nodes: 3

  Data for node: Name: login001 Launch id: -1 Arch: 0 State: 2
         Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
         Daemon: [[44812,0],1] Daemon launched: False
         Num slots: 1 Slots in use: 2
         Num slots allocated: 1 Max slots: 0
         Username on node: NULL
         Num procs: 1 Next node_rank: 1
         Data for proc: [[44812,1],0]
                 Pid: 0 Local rank: 0 Node rank: 0
                 State: 0 App_context: 0 Slot list: NULL

  Data for node: Name: login002 Launch id: -1 Arch: ffc91200
State: 2
         Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
         Daemon: [[44812,0],0] Daemon launched: True
         Num slots: 1 Slots in use: 2
         Num slots allocated: 1 Max slots: 0
         Username on node: NULL
         Num procs: 1 Next node_rank: 1
         Data for proc: [[44812,1],0]
                 Pid: 0 Local rank: 0 Node rank: 0
                 State: 0 App_context: 0 Slot list: NULL

  Data for node: Name: login003 Launch id: -1 Arch: 0 State: 2
         Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
         Daemon: [[44812,0],2] Daemon launched: False
         Num slots: 1 Slots in use: 2
         Num slots allocated: 1 Max slots: 0
         Username on node: NULL
         Num procs: 1 Next node_rank: 1
         Data for proc: [[44812,1],0]
                 Pid: 0 Local rank: 0 Node rank: 0
                 State: 0 App_context: 0 Slot list: NULL
[login002:02079] *** Process received signal ***
[login002:02079] Signal: Segmentation fault (11)
[login002:02079] Signal code: Address not mapped (1)
[login002:02079] Failing at address: 0x50
[login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
[login002:02079] [ 1]
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7)
[0x2afa70d25de7]
[login002:02079] [ 2]
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8)
[0x2afa70d36088]
[login002:02079] [ 3]
/g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7)
[0x2afa70d37fc7]
[login002:02079] [ 4]
/g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]
[login002:02079] [ 5] mpirun [0x404c27]
[login002:02079] [ 6] mpirun [0x403e38]
[login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3568e1d994]
[login002:02079] [ 8] mpirun [0x403d69]
[login002:02079] *** End of error message ***
Segmentation fault

We tried version 1.4.1 and this problem did not emerge.

This option is necessary for when our users launch hybrid MPI-OMP code
were they can request M nodes and n ppn in a *PBS/Torque* setup so they
can only get the right amount of MPI taks. Unfortunately, as soon as we
use the 'npernode N' option mprun crashes.

Is this a known issue? I found related problem (of around May, 2010)
when people were using the same option but in a SLURM environment.

regards

Michael