Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Strange behaviour of SGE+OpenMPI
From: PN (poknam_at_[hidden])
Date: 2009-03-31 22:39:36


Dear Rolf,

Thanks for your reply.
I've created another PE and changed the submission script, explicitly
specify the hostname with "--host".
However the result is the same.

# qconf -sp orte
pe_name orte
slots 8
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE

$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host
node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002
./bin/goto-openmpi-gcc/xhpl

# pdsh -a ps ax --width=200|grep hpl
node0002: 18901 ? S 0:00 /opt/openmpi-gcc/bin/mpirun -v -np 8
--host
node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002
./bin/goto-openmpi-gcc/xhpl
node0002: 18902 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18903 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18904 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18905 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18906 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18907 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18908 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18909 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl

Any hint to debug this situation?

Also, if I have 2 IB ports in each node, which IB bonding was done, will
Open MPI automatically benefit from the double bandwidth?

Thanks a lot.

Best Regards,
PN

2009/4/1 Rolf Vandevaart <Rolf.Vandevaart_at_[hidden]>

> On 03/31/09 11:43, PN wrote:
>
>> Dear all,
>>
>> I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
>> I have 2 compute nodes for testing, each node has a single quad core CPU.
>>
>> Here is my submission script and PE config:
>> $ cat hpl-8cpu.sge
>> #!/bin/bash
>> #
>> #$ -N HPL_8cpu_IB
>> #$ -pe mpi-fu 8
>> #$ -cwd
>> #$ -j y
>> #$ -S /bin/bash
>> #$ -V
>> #
>> cd /home/admin/hpl-2.0
>> # For IB
>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines
>> ./bin/goto-openmpi-gcc/xhpl
>>
>> I've tested the mpirun command can be run correctly in command line.
>>
>> $ qconf -sp mpi-fu
>> pe_name mpi-fu
>> slots 8
>> user_lists NONE
>> xuser_lists NONE
>> start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
>> stop_proc_args /opt/sge/mpi/stopmpi.sh
>> allocation_rule $fill_up
>> control_slaves TRUE
>> job_is_first_task FALSE
>> urgency_slots min
>> accounting_summary TRUE
>>
>>
>> I've checked the $TMPDIR/machines after submit, it was correct.
>> node0002
>> node0002
>> node0002
>> node0002
>> node0001
>> node0001
>> node0001
>> node0001
>>
>> However, I found that if I explicitly specify the "-machinefile
>> $TMPDIR/machines", all 8 mpi processes were spawned within a single node,
>> i.e. node0002.
>>
>> However, if I omit "-machinefile $TMPDIR/machines" in the line mpirun,
>> i.e.
>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS ./bin/goto-openmpi-gcc/xhpl
>>
>> The mpi processes can start correctly, 4 processes in node0001 and 4
>> processes in node0002.
>>
>> Is this normal behaviour of Open MPI?
>>
>
> I just tried it both ways and I got the same result both times. The
> processes are split between the nodes. Perhaps to be extra sure, you can
> just run hostname? And for what it is worth, as you have seen, you do not
> need to specify a machines file. Open MPI will use the ones that were
> allocated by SGE. You can also change your parallel queue to not run any
> scripts. Like this:
>
> start_proc_args /bin/true
> stop_proc_args /bin/true
>
>
>> Also, I wondered if I have IB interface, for example, the hostname of IB
>> become node0001-clust and node0002-clust, will Open MPI automatically use
>> the IB interface?
>>
> Yes, it should use the IB interface.
>
>>
>> How about if I have 2 IB ports in each node, which IB bonding was done,
>> will Open MPI automatically benefit from the double bandwidth?
>>
>> Thanks a lot.
>>
>> Best Regards,
>> PN
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
>
> =========================
> rolf.vandevaart_at_[hidden]
> 781-442-3043
> =========================
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>