Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Strange behaviour of SGE+OpenMPI
From: PN (poknam_at_[hidden])
Date: 2009-04-01 12:37:57


Thanks.

$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
/opt/openmpi-gcc/bin/mpirun --display-allocation --display-map -v -np
$NSLOTS --host node0001,node0002 hostname

$ cat HPL_8cpu_GB.o46

====================== ALLOCATED NODES ======================

 Data for node: Name: node0001 Num slots: 4 Max slots: 0
 Data for node: Name: node0002.v5cluster.com Num slots: 4 Max slots: 0

=================================================================

 ======================== JOB MAP ========================

 Data for node: Name: node0001 Num procs: 8
        Process OMPI jobid: [10982,1] Process rank: 0
        Process OMPI jobid: [10982,1] Process rank: 1
        Process OMPI jobid: [10982,1] Process rank: 2
        Process OMPI jobid: [10982,1] Process rank: 3
        Process OMPI jobid: [10982,1] Process rank: 4
        Process OMPI jobid: [10982,1] Process rank: 5
        Process OMPI jobid: [10982,1] Process rank: 6
        Process OMPI jobid: [10982,1] Process rank: 7

 =============================================================
node0001
node0001
node0001
node0001
node0001
node0001
node0001
node0001

I'm not sure why node0001 miss the domain name, is this related?
However the result is correct when I run "qconf -sel"

$ qconf -sel
node0001.v5cluster.com
node0002.v5cluster.com

2009/4/1 Ralph Castain <rhc_at_[hidden]>

> Rolf has correctly reminded me that display-allocation occurs prior to host
> filtering, so you will see all of the allocated nodes. You'll see the impact
> of the host specifications in display-map,
>
> Sorry for the confusion - thanks to Rolf for pointing it out.
> Ralph
>
>
> On Apr 1, 2009, at 7:40 AM, Ralph Castain wrote:
>
> As an FYI: you can debug allocation issues more easily by:
>>
>> mpirun --display-allocation --do-not-launch -n 1 foo
>>
>> This will read the allocation, do whatever host filtering you specify with
>> -host and -hostfile options, report out the result, and then terminate
>> without trying to launch anything. I found it most useful for debugging
>> these situations.
>>
>> If you want to know where the procs would have gone, then you can do:
>>
>> mpirun --display-allocation --display-map --do-not-launch -n 8 foo
>>
>> In this case, the #procs you specify needs to be the number you actually
>> wanted so that the mapper will properly run. However, the executable can be
>> bogus and nothing will actually launch. It's the closest you can come to a
>> dry run of a job.
>>
>> HTH
>> Ralph
>>
>>
>> On Apr 1, 2009, at 7:10 AM, Rolf Vandevaart wrote:
>>
>> It turns out that the use of --host and --hostfile act as a filter of
>>> which nodes to run on when you are running under SGE. So, listing them
>>> several times does not affect where the processes land. However, this still
>>> does not explain why you are seeing what you are seeing. One thing you can
>>> try is to add this to the mpirun command.
>>>
>>> -mca ras_gridengine_verbose 100
>>>
>>> This will provide some additional information as to what Open MPI is
>>> seeing as nodes and slots from SGE. (Is there any chance that node0002
>>> actually has 8 slots?)
>>>
>>> I just retried on my cluster of 2 CPU sparc solaris nodes. When I run
>>> with np=2, the two MPI processes will all land on a single node, because
>>> that node has two slots. When I go up to np=4, then they move on to the
>>> other node. The --host acts as a filter to where they should run.
>>>
>>> In terms of the using "IB bonding", I do not know what that means
>>> exactly. Open MPI does stripe over multiple IB interfaces, so I think the
>>> answer is yes.
>>>
>>> Rolf
>>>
>>> PS: Here is what my np=4 job script looked like. (I just changed np=2
>>> for the other run)
>>>
>>> burl-ct-280r-0 148 =>more run.sh
>>> #! /bin/bash
>>> #$ -S /bin/bash
>>> #$ -V
>>> #$ -cwd
>>> #$ -N Job1
>>> #$ -pe orte 200
>>> #$ -j y
>>> #$ -l h_rt=00:20:00 # Run time (hh:mm:ss) - 10 min
>>>
>>> echo $NSLOTS
>>> /opt/SUNWhpc/HPC8.2/sun/bin/mpirun -mca ras_gridengine_verbose 100 -v -np
>>> 4 -host burl-ct-280r-1,burl-ct-280r-0 -mca btl self,sm,tcp hostname
>>>
>>> Here is the output (somewhat truncated)
>>> burl-ct-280r-0 150 =>more Job1.o199
>>> 200
>>> [burl-ct-280r-2:22132] ras:gridengine: JOB_ID: 199
>>> [burl-ct-280r-2:22132] ras:gridengine: PE_HOSTFILE:
>>> /ws/ompi-tools/orte/sge/sge6_2u1/default/spool/burl-ct-280r-2/active_jobs/199.1/pe_hostfile
>>> [..snip..]
>>> [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-0: PE_HOSTFILE shows
>>> slots=2
>>> [burl-ct-280r-2:22132] ras:gridengine: burl-ct-280r-1: PE_HOSTFILE shows
>>> slots=2
>>> [..snip..]
>>> burl-ct-280r-1
>>> burl-ct-280r-1
>>> burl-ct-280r-0
>>> burl-ct-280r-0
>>> burl-ct-280r-0 151 =>
>>>
>>>
>>> On 03/31/09 22:39, PN wrote:
>>>
>>>> Dear Rolf,
>>>> Thanks for your reply.
>>>> I've created another PE and changed the submission script, explicitly
>>>> specify the hostname with "--host".
>>>> However the result is the same.
>>>> # qconf -sp orte
>>>> pe_name orte
>>>> slots 8
>>>> user_lists NONE
>>>> xuser_lists NONE
>>>> start_proc_args /bin/true
>>>> stop_proc_args /bin/true
>>>> allocation_rule $fill_up
>>>> control_slaves TRUE
>>>> job_is_first_task FALSE
>>>> urgency_slots min
>>>> accounting_summary TRUE
>>>> $ cat hpl-8cpu-test.sge
>>>> #!/bin/bash
>>>> #
>>>> #$ -N HPL_8cpu_GB
>>>> #$ -pe orte 8
>>>> #$ -cwd
>>>> #$ -j y
>>>> #$ -S /bin/bash
>>>> #$ -V
>>>> #
>>>> cd /home/admin/hpl-2.0
>>>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host
>>>> node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002
>>>> ./bin/goto-openmpi-gcc/xhpl
>>>> # pdsh -a ps ax --width=200|grep hpl
>>>> node0002: 18901 ? S 0:00 /opt/openmpi-gcc/bin/mpirun -v -np
>>>> 8 --host
>>>> node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002
>>>> ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18902 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18903 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18904 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18905 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18906 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18907 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18908 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
>>>> node0002: 18909 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
>>>> Any hint to debug this situation?
>>>> Also, if I have 2 IB ports in each node, which IB bonding was done, will
>>>> Open MPI automatically benefit from the double bandwidth?
>>>> Thanks a lot.
>>>> Best Regards,
>>>> PN
>>>> 2009/4/1 Rolf Vandevaart <Rolf.Vandevaart_at_[hidden] <mailto:
>>>> Rolf.Vandevaart_at_[hidden]>>
>>>> On 03/31/09 11:43, PN wrote:
>>>> Dear all,
>>>> I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
>>>> I have 2 compute nodes for testing, each node has a single quad
>>>> core CPU.
>>>> Here is my submission script and PE config:
>>>> $ cat hpl-8cpu.sge
>>>> #!/bin/bash
>>>> #
>>>> #$ -N HPL_8cpu_IB
>>>> #$ -pe mpi-fu 8
>>>> #$ -cwd
>>>> #$ -j y
>>>> #$ -S /bin/bash
>>>> #$ -V
>>>> #
>>>> cd /home/admin/hpl-2.0
>>>> # For IB
>>>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile
>>>> $TMPDIR/machines ./bin/goto-openmpi-gcc/xhpl
>>>> I've tested the mpirun command can be run correctly in command
>>>> line.
>>>> $ qconf -sp mpi-fu
>>>> pe_name mpi-fu
>>>> slots 8
>>>> user_lists NONE
>>>> xuser_lists NONE
>>>> start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
>>>> stop_proc_args /opt/sge/mpi/stopmpi.sh
>>>> allocation_rule $fill_up
>>>> control_slaves TRUE
>>>> job_is_first_task FALSE
>>>> urgency_slots min
>>>> accounting_summary TRUE
>>>> I've checked the $TMPDIR/machines after submit, it was correct.
>>>> node0002
>>>> node0002
>>>> node0002
>>>> node0002
>>>> node0001
>>>> node0001
>>>> node0001
>>>> node0001
>>>> However, I found that if I explicitly specify the "-machinefile
>>>> $TMPDIR/machines", all 8 mpi processes were spawned within a
>>>> single node, i.e. node0002.
>>>> However, if I omit "-machinefile $TMPDIR/machines" in the line
>>>> mpirun, i.e.
>>>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS
>>>> ./bin/goto-openmpi-gcc/xhpl
>>>> The mpi processes can start correctly, 4 processes in node0001
>>>> and 4 processes in node0002.
>>>> Is this normal behaviour of Open MPI?
>>>> I just tried it both ways and I got the same result both times. The
>>>> processes are split between the nodes. Perhaps to be extra sure,
>>>> you can just run hostname? And for what it is worth, as you have
>>>> seen, you do not need to specify a machines file. Open MPI will use
>>>> the ones that were allocated by SGE. You can also change your
>>>> parallel queue to not run any scripts. Like this:
>>>> start_proc_args /bin/true
>>>> stop_proc_args /bin/true
>>>> Also, I wondered if I have IB interface, for example, the
>>>> hostname of IB become node0001-clust and node0002-clust, will
>>>> Open MPI automatically use the IB interface?
>>>> Yes, it should use the IB interface.
>>>> How about if I have 2 IB ports in each node, which IB bonding
>>>> was done, will Open MPI automatically benefit from the double
>>>> bandwidth?
>>>> Thanks a lot.
>>>> Best Regards,
>>>> PN
>>>>
>>>> ------------------------------------------------------------------------
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> -- =========================
>>>> rolf.vandevaart_at_[hidden] <mailto:rolf.vandevaart_at_[hidden]>
>>>> 781-442-3043
>>>> =========================
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> ------------------------------------------------------------------------
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>> --
>>>
>>> =========================
>>> rolf.vandevaart_at_[hidden]
>>> 781-442-3043
>>> =========================
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>