Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Strange behaviour of SGE+OpenMPI
From: PN (poknam_at_[hidden])
Date: 2009-03-31 11:43:25

Dear all,

I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single quad core CPU.

Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
cd /home/admin/hpl-2.0
# For IB
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines

I've tested the mpirun command can be run correctly in command line.

$ qconf -sp mpi-fu
pe_name mpi-fu
slots 8
user_lists NONE
xuser_lists NONE
start_proc_args /opt/sge/mpi/ -catch_rsh $pe_hostfile
stop_proc_args /opt/sge/mpi/
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE

I've checked the $TMPDIR/machines after submit, it was correct.

However, I found that if I explicitly specify the "-machinefile
$TMPDIR/machines", all 8 mpi processes were spawned within a single node,
i.e. node0002.

However, if I omit "-machinefile $TMPDIR/machines" in the line mpirun, i.e.
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS ./bin/goto-openmpi-gcc/xhpl

The mpi processes can start correctly, 4 processes in node0001 and 4
processes in node0002.

Is this normal behaviour of Open MPI?

Also, I wondered if I have IB interface, for example, the hostname of IB
become node0001-clust and node0002-clust, will Open MPI automatically use
the IB interface?

How about if I have 2 IB ports in each node, which IB bonding was done, will
Open MPI automatically benefit from the double bandwidth?

Thanks a lot.

Best Regards,