Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] running a ompi 1.4.2 job with -np versus -npernode
From: Christopher Maestas (cdmaestas_at_[hidden])
Date: 2010-05-17 18:25:26


OK. The -np only run:

---
sh-3.1$ mpirun -np 2 --display-allocation --display-devel-map mpi_hello
======================   ALLOCATED NODES   ======================
 Data for node: Name: cut1n7            Launch id: -1   Arch: ffc91200
 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[51868,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 0
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 0    Next node_rank: 0
 Data for node: Name: cut1n8            Launch id: -1   Arch: 0 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: Not defined     Daemon launched: False
        Num slots: 0    Slots in use: 0
        Num slots allocated: 0  Max slots: 0
        Username on node: NULL
        Num procs: 0    Next node_rank: 0
=================================================================
 Map generated by mapping policy: 0400
        Npernode: 0     Oversubscribe allowed: TRUE     CPU Lists: FALSE
        Num new daemons: 1      New daemon starting vpid 1
        Num nodes: 2
 Data for node: Name: cut1n7            Launch id: -1   Arch: ffc91200
 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[51868,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 1
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 1    Next node_rank: 1
        Data for proc: [[51868,1],0]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        App_context: 0  Slot list: NULL
 Data for node: Name: cut1n8            Launch id: -1   Arch: 0 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[51868,0],1]   Daemon launched: False
        Num slots: 0    Slots in use: 1
        Num slots allocated: 0  Max slots: 0
        Username on node: NULL
        Num procs: 1    Next node_rank: 1
        Data for proc: [[51868,1],1]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        App_context: 0  Slot list: NULL
Hello, I am node cut1n8 with rank 1
Hello, I am node cut1n7 with rank 0
---
Before the segfault I got (using -npernode):
---
sh-3.1$ mpirun -npernode 1 --display-allocation --display-devel-map
mpi_hello
======================   ALLOCATED NODES   ======================
 Data for node: Name: cut1n7            Launch id: -1   Arch: ffc91200
 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[51942,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 0
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 0    Next node_rank: 0
 Data for node: Name: cut1n8            Launch id: -1   Arch: 0 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: Not defined     Daemon launched: False
        Num slots: 0    Slots in use: 0
        Num slots allocated: 0  Max slots: 0
        Username on node: NULL
        Num procs: 0    Next node_rank: 0
=================================================================
 Map generated by mapping policy: 0400
        Npernode: 1     Oversubscribe allowed: TRUE     CPU Lists: FALSE
        Num new daemons: 1      New daemon starting vpid 1
        Num nodes: 2
 Data for node: Name: cut1n7            Launch id: -1   Arch: ffc91200
 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[51942,0],0]   Daemon launched: True
        Num slots: 1    Slots in use: 1
        Num slots allocated: 1  Max slots: 0
        Username on node: NULL
        Num procs: 1    Next node_rank: 1
        Data for proc: [[51942,1],0]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        App_context: 0  Slot list: NULL
 Data for node: Name: cut1n8            Launch id: -1   Arch: 0 State: 2
        Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
        Daemon: [[51942,0],1]   Daemon launched: False
        Num slots: 0    Slots in use: 1
        Num slots allocated: 0  Max slots: 0
        Username on node: NULL
        Num procs: 1    Next node_rank: 1
        Data for proc: [[51942,1],0]
                Pid: 0  Local rank: 0   Node rank: 0
                State: 0        App_context: 0  Slot list: NULL
[cut1n7:19375] *** Process received signal ***
[cut1n7:19375] Signal: Segmentation fault (11)
[cut1n7:19375] Signal code: Address not mapped (1)
[cut1n7:19375] Failing at address: 0x50
[cut1n7:19375] [ 0] /lib64/libpthread.so.0 [0x37bda0de80]
[cut1n7:19375] [ 1]
/apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xdb)
[0x2aed0f93af8b]
[cut1n7:19375] [ 2]
/apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x655)
[0x2aed0f9462f5]
[cut1n7:19375] [ 3]
/apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x10b)
[0x2aed0f94d31b]
[cut1n7:19375] [ 4]
/apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/openmpi/mca_plm_slurm.so
[0x2aed107f6ecf]
[cut1n7:19375] [ 5] mpirun [0x40335a]
[cut1n7:19375] [ 6] mpirun [0x4029f3]
[cut1n7:19375] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x37bce1d8b4]
[cut1n7:19375] [ 8] mpirun [0x402929]
[cut1n7:19375] *** End of error message ***
Segmentation fault
---
I'll look into a slurm version update.  Previously, SLURM 1.0.30 and Open
MPI 1.3.2 working together.  Just curious what was giving me heartache here
...
On Mon, May 17, 2010 at 4:06 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> That's a pretty old version of slurm - I don't have access to anything that
> old to test against. You could try running it with --display-allocation
> --display-devel-map to see what ORTE thinks the allocation is and how it
> mapped the procs. It sounds like something may be having a problem there...
>
>
> On Mon, May 17, 2010 at 11:08 AM, Christopher Maestas <cdmaestas_at_[hidden]
> > wrote:
>
>> Hello,
>>
>> I've been having some troubles with OpenMPI 1.4.X and slurm recently.  I
>> seem to be able to run jobs this way ok:
>> ---
>> sh-3.1$ mpirun -np 2 mpi_hello
>> Hello, I am node cut1n7 with rank 0
>> Hello, I am node cut1n8 with rank 1
>> --
>>
>> However if I try and use the -npernode option I get:
>> ---
>> sh-3.1$ mpirun -npernode 1 mpi_hello
>> [cut1n7:16368] *** Process received signal ***
>> [cut1n7:16368] Signal: Segmentation fault (11)
>> [cut1n7:16368] Signal code: Address not mapped (1)
>> [cut1n7:16368] Failing at address: 0x50
>> [cut1n7:16368] [ 0] /lib64/libpthread.so.0 [0x37bda0de80]
>> [cut1n7:16368] [ 1]
>> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xdb)
>> [0x2b73eb84df8b]
>> [cut1n7:16368] [ 2]
>> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x655)
>> [0x2b73eb8592f5]
>> [cut1n7:16368] [ 3]
>> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x10b)
>> [0x2b73eb86031b]
>> [cut1n7:16368] [ 4]
>> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/openmpi/mca_plm_slurm.so
>> [0x2b73ec709ecf]
>> [cut1n7:16368] [ 5] mpirun [0x40335a]
>> [cut1n7:16368] [ 6] mpirun [0x4029f3]
>> [cut1n7:16368] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x37bce1d8b4]
>> [cut1n7:16368] [ 8] mpirun [0x402929]
>> [cut1n7:16368] *** End of error message ***
>> Segmentation fault
>> ---
>>
>> This is ompi 1.4.2, gcc 4.1.1 and slurm 2.0.9 ... I'm sure it's a rather
>> silly detail on my end, but figure I should start this thread for any
>> insights and feedback I can help provide to resolve this.
>>
>> Thanks,
>> -cdm
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>