On 03/31/09 11:43, PN wrote:
> Dear all,
> I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
> I have 2 compute nodes for testing, each node has a single quad core CPU.
> Here is my submission script and PE config:
> $ cat hpl-8cpu.sge
> #$ -N HPL_8cpu_IB
> #$ -pe mpi-fu 8
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #$ -V
> cd /home/admin/hpl-2.0
> # For IB
> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines
> I've tested the mpirun command can be run correctly in command line.
> $ qconf -sp mpi-fu
> pe_name mpi-fu
> slots 8
> user_lists NONE
> xuser_lists NONE
> start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
> stop_proc_args /opt/sge/mpi/stopmpi.sh
> allocation_rule $fill_up
> control_slaves TRUE
> job_is_first_task FALSE
> urgency_slots min
> accounting_summary TRUE
> I've checked the $TMPDIR/machines after submit, it was correct.
> However, I found that if I explicitly specify the "-machinefile
> $TMPDIR/machines", all 8 mpi processes were spawned within a single
> node, i.e. node0002.
> However, if I omit "-machinefile $TMPDIR/machines" in the line mpirun, i.e.
> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS ./bin/goto-openmpi-gcc/xhpl
> The mpi processes can start correctly, 4 processes in node0001 and 4
> processes in node0002.
> Is this normal behaviour of Open MPI?
I just tried it both ways and I got the same result both times. The
processes are split between the nodes. Perhaps to be extra sure, you
can just run hostname? And for what it is worth, as you have seen, you
do not need to specify a machines file. Open MPI will use the ones that
were allocated by SGE. You can also change your parallel queue to not
run any scripts. Like this:
> Also, I wondered if I have IB interface, for example, the hostname of IB
> become node0001-clust and node0002-clust, will Open MPI automatically
> use the IB interface?
Yes, it should use the IB interface.
> How about if I have 2 IB ports in each node, which IB bonding was done,
> will Open MPI automatically benefit from the double bandwidth?
> Thanks a lot.
> Best Regards,
> users mailing list