Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] --without-tm [SEC=UNCLASSIFIED]
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-02-21 09:43:54


Simplest soln: add -bynode to your mpirun cmd line

On Feb 20, 2011, at 10:50 PM, DOHERTY, Greg wrote:

> In order to be able to checkpoint openmpi jobs with blcr, we have
> configured openmpi as follows
>
> ./configure --prefix=/data1/packages/openmpi/1.5.1-blcr-without-tm
> --disable-openib-connectx-xrc --disable-openib-rdmacm --with-ft=cr
> --enable-mpi-threads --enable-ft-thread --with-blcr=/usr
> --with-blcr-libdir=/usr/include --without-tm
>
> When used in conjunction with torque2.5.3, we are able to start the
> following job with 8 cores on one node, but if we try to start the same
> job with 4 cores on each of two nodes, the job starts 4 cores on the
> primary node, but not the remaining 4 cores on the second node.
>
> $ cat PBStest
> #!/bin/sh
> #PBS -c enabled
> #PBS -l walltime=25:00:00
> #PBS -l nodes=2:ppn=4
> #PBS -m ae
> #PBS -M gdz_at_[hidden]
> #PBS -N Prob8
> #PBS -r n
> #PBS -q blcrq
> source /etc/profile.d/00-modules.sh
> module load mpi/openmpi_1.5-blcr-without-tm
> NN=`cat $PBS_NODEFILE | wc -l`
> cd $PBS_O_WORKDIR
> cat $PBS_NODEFILE > hostfile
> cat $PBS_NODEFILE
> pwd
> echo "NN = $NN "
> date
> which mpirun
> cd $PBS_O_WORKDIR
> mpirun -am ft-enable-cr -machinefile hostfile ex5mpi testData
> --------------------------------------------------------------
> The hostfile correctly lists the primary node 4 times, and then the
> second node 4 times.
>
> When openmpi is built --with-tm, which is the default if --without-tm is
> not specified, the job correctly starts on the 8 cores spread across the
> 4 nodes.
>
> blcr needs cr_mpirun to start the job without torque support to be able
> to checkpoint the mpi job correctly.
>
> My question is whether it is possible for the script above to be
> modified in order to start on multiple nodes if openmpi has been built
> with --without-tm and, if so, what needs to be added or deleted from the
> script?
> I have tried -mca plm ^tm with openmpi built --with-tm which also will
> not start the second 4 mpi ranks.
>
> Any suggestions gratefully accepted.
> Greg Doherty
> ANSTO
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users