Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] running a ompi 1.4.2 job with -np versus -npernode
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-05-17 18:06:44


That's a pretty old version of slurm - I don't have access to anything that
old to test against. You could try running it with --display-allocation
--display-devel-map to see what ORTE thinks the allocation is and how it
mapped the procs. It sounds like something may be having a problem there...

On Mon, May 17, 2010 at 11:08 AM, Christopher Maestas
<cdmaestas_at_[hidden]>wrote:

> Hello,
>
> I've been having some troubles with OpenMPI 1.4.X and slurm recently. I
> seem to be able to run jobs this way ok:
> ---
> sh-3.1$ mpirun -np 2 mpi_hello
> Hello, I am node cut1n7 with rank 0
> Hello, I am node cut1n8 with rank 1
> --
>
> However if I try and use the -npernode option I get:
> ---
> sh-3.1$ mpirun -npernode 1 mpi_hello
> [cut1n7:16368] *** Process received signal ***
> [cut1n7:16368] Signal: Segmentation fault (11)
> [cut1n7:16368] Signal code: Address not mapped (1)
> [cut1n7:16368] Failing at address: 0x50
> [cut1n7:16368] [ 0] /lib64/libpthread.so.0 [0x37bda0de80]
> [cut1n7:16368] [ 1]
> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xdb)
> [0x2b73eb84df8b]
> [cut1n7:16368] [ 2]
> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x655)
> [0x2b73eb8592f5]
> [cut1n7:16368] [ 3]
> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x10b)
> [0x2b73eb86031b]
> [cut1n7:16368] [ 4]
> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/openmpi/mca_plm_slurm.so
> [0x2b73ec709ecf]
> [cut1n7:16368] [ 5] mpirun [0x40335a]
> [cut1n7:16368] [ 6] mpirun [0x4029f3]
> [cut1n7:16368] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x37bce1d8b4]
> [cut1n7:16368] [ 8] mpirun [0x402929]
> [cut1n7:16368] *** End of error message ***
> Segmentation fault
> ---
>
> This is ompi 1.4.2, gcc 4.1.1 and slurm 2.0.9 ... I'm sure it's a rather
> silly detail on my end, but figure I should start this thread for any
> insights and feedback I can help provide to resolve this.
>
> Thanks,
> -cdm
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>