Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] SGE error: executing task of job 22966 failed:
From: Korambath, Prakashan (ppk_at_[hidden])
Date: 2008-04-03 15:04:05


  I just compiled OpenMPI version 1.2.5 with the option

./configure --prefix=/u/local/mpi/openmpi/1.2.5 --with-openib=/usr/local --enable-static --disable-shared CC=icc CXX=icpc F77=ifort FC=ifort --with-sge

on a X86_64 machine with Infiniband Interconnect and OFED software and CentOS 5 OS

Everything works fine on command line job submission, but when I submit through SGE 6.1U3 I am getting following error

error: executing task of job 23081 failed:
[n99:01442] ERROR: A daemon on node n99 failed to start as expected.
[n99:01442] ERROR: There may be more information available from
[n99:01442] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[n99:01442] ERROR: If the problem persists, please restart the
[n99:01442] ERROR: Grid Engine PE job
[n99:01442] ERROR: The daemon exited unexpectedly with status 1.

In my command script for SGE I have
#$ -pe orte 2

/u/local/mpi/openmpi/1.2.5/bin/mpiexec -n 2 -machinefile $TMPDIR/nodefile \
         /u/home2/ppk/MPI/C/executablename >& output

n99:/work/23081.1.campus.q {1002}$ cat nodefile
n99 slots=1
n15 slots=1

n99:/work/23081.1.campus.q {1003}$ qconf -sp orte
pe_name orte
slots 360
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task TRUE
urgency_slots min

I am combing through the archives to look for similar errors. I have seen some of it, but no satisfactory answer. Anyone knows why?

i02:/u/local/mpi/openmpi/1.2.5/bin {1049}$ ./ompi_info | grep tm
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5)

I also tried pre-relese 1.2.6rc3 same results.