Dear all,
I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single quad core CPU.
Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#!/bin/bash
#
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
# For IB
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines ./bin/goto-openmpi-gcc/xhpl
I've tested the mpirun command can be run correctly in command line.
$ qconf -sp mpi-fu
pe_name mpi-fu
slots 8
user_lists NONE
xuser_lists NONE
start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args /opt/sge/mpi/stopmpi.sh
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
I've checked the $TMPDIR/machines after submit, it was correct.
node0002
node0002
node0002
node0002
node0001
node0001
node0001
node0001
However, I found that if I explicitly specify the "-machinefile $TMPDIR/machines", all 8 mpi processes were spawned within a single node, i.e. node0002.
However, if I omit "-machinefile $TMPDIR/machines" in the line mpirun, i.e.
/opt/openmpi-gcc/bin/mpirun -v -np
$NSLOTS ./bin/goto-openmpi-gcc/xhpl
The mpi processes can start correctly, 4 processes in node0001 and 4 processes in node0002.
Is this normal behaviour of Open MPI?
Also, I wondered if I have IB interface, for example, the hostname of IB become node0001-clust and node0002-clust, will Open MPI automatically use the IB interface?
How about if I have 2 IB ports in each node, which IB bonding was done, will Open MPI automatically benefit from the double bandwidth?
Thanks a lot.
Best Regards,
PN