Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-08-23 20:30:36


Yes, the -npernode segv is a known issue.

We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and see if that fixes your problem?

    http://www.open-mpi.org/nightly/v1.4/

On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:

> Hello OMPI:
>
> We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. OMPI was built uisng Intel compilers 11.1.072. I am attaching the configuration log and output from ompi_info -a.
>
> The problem we are encountering is that whenever we use option '-npernode N' in the mpirun command line we get a segmentation fault as in below:
>
>
> miket_at_login002[pts/7]PS $ mpirun -npernode 1 --display-devel-map --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname
>
> Map generated by mapping policy: 0402
> Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE
> Num new daemons: 2 New daemon starting vpid 1
> Num nodes: 3
>
> Data for node: Name: login001 Launch id: -1 Arch: 0 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: [[44812,0],1] Daemon launched: False
> Num slots: 1 Slots in use: 2
> Num slots allocated: 1 Max slots: 0
> Username on node: NULL
> Num procs: 1 Next node_rank: 1
> Data for proc: [[44812,1],0]
> Pid: 0 Local rank: 0 Node rank: 0
> State: 0 App_context: 0 Slot list: NULL
>
> Data for node: Name: login002 Launch id: -1 Arch: ffc91200 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: [[44812,0],0] Daemon launched: True
> Num slots: 1 Slots in use: 2
> Num slots allocated: 1 Max slots: 0
> Username on node: NULL
> Num procs: 1 Next node_rank: 1
> Data for proc: [[44812,1],0]
> Pid: 0 Local rank: 0 Node rank: 0
> State: 0 App_context: 0 Slot list: NULL
>
> Data for node: Name: login003 Launch id: -1 Arch: 0 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: [[44812,0],2] Daemon launched: False
> Num slots: 1 Slots in use: 2
> Num slots allocated: 1 Max slots: 0
> Username on node: NULL
> Num procs: 1 Next node_rank: 1
> Data for proc: [[44812,1],0]
> Pid: 0 Local rank: 0 Node rank: 0
> State: 0 App_context: 0 Slot list: NULL
> [login002:02079] *** Process received signal ***
> [login002:02079] Signal: Segmentation fault (11)
> [login002:02079] Signal code: Address not mapped (1)
> [login002:02079] Failing at address: 0x50
> [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
> [login002:02079] [ 1] /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7) [0x2afa70d25de7]
> [login002:02079] [ 2] /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8) [0x2afa70d36088]
> [login002:02079] [ 3] /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7) [0x2afa70d37fc7]
> [login002:02079] [ 4] /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]
> [login002:02079] [ 5] mpirun [0x404c27]
> [login002:02079] [ 6] mpirun [0x403e38]
> [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3568e1d994]
> [login002:02079] [ 8] mpirun [0x403d69]
> [login002:02079] *** End of error message ***
> Segmentation fault
>
> We tried version 1.4.1 and this problem did not emerge.
>
> This option is necessary for when our users launch hybrid MPI-OMP code were they can request M nodes and n ppn in a PBS/Torque setup so they can only get the right amount of MPI taks. Unfortunately, as soon as we use the 'npernode N' option mprun crashes.
>
> Is this a known issue? I found related problem (of around May, 2010) when people were using the same option but in a SLURM environment.
>
> regards
>
> Michael
>
> <config.log.gz><ompi_info-a.out.gz>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/