On 04/06/2011 07:09 PM, Jason Palmer wrote:
> I am having trouble running a batch job in SGE using openmpi. I have read
> the faq, which says that openmpi will automatically do the right thing, but
> something seems to be wrong.
> Previously I used MPICH1 under SGE without any problems. I'm avoiding MPICH2
> because it doesn't seem to support static compilation, whereas I was able to
> get openmpi to compile with open64 and compile my program statically.
> But I am having problems launching. According to the documentation, I should
> be able to have a script file, qsub.sh:
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #$ -q all.q
> #$ -pe orte 18
> /home/jason/openmpi-1.4.3-install/bin/mpirun -np $NSLOTS myprog
If you have SGE integration, you should not specify the number of slots
requested on the command-line. Open MPI will speak directly to SGE (or
vice versa, to get this information.
Also, what is the significance of specifying MPI_DIR? I think want to
add that to your PATH, and then export it to the rest of the nodes by
using the -V switch to qsub. If the correct mpirun isn't found first in
your PATH, your job will definitely fail when launched on the slave hosts.
You also should add the path to the MPI libraries to your LD_LIBRARY
PATH, too, or else you'll endup with run-time linking problems.
For example, I would change your submission script to look like this:
#$ -j y
#$ -S /bin/bash
#$ -q all.q
#$ -pe orte 18
This may not fix all your problems, but will definitely fix some of them.