Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] can't run the code on Jaguar
From: bin wang (bighead521_at_[hidden])
Date: 2012-03-06 18:28:05


Hello Ralph,

Thanks for your reply.

In order to start my job, I tried the following two ways
(1) configured/compiled open-mpi and compiled benchmark on head node.
      submitted a pbs job.
(2) submitted an interactive job to redo config/compile on compute node.
      And then used "/path/to/mpicc -o hello hello_world.c" to compile the
benchmark.
      used "/path/tp/mpirun -np 2 /path/to/hello" to run the job.
Actually I also tried to run "/path/tp/mpirun -np 2 hostname" but got the
same error.

The configure line is pretty long.

 67 $SRCDIR/configure \
 68 --prefix=$PREFIX \
 69 --enable-static --disable-shared --disable-dlopen
--disable-pretty-print-stacktrace --disable-pty-support --disable-io-romio
--enable-contrib-no-build=libnbc,vt --enable-debug \
 70 --with-memory-manager=none --with-threads \
 71 --without-tm \
 72 --with-wrapper-ldflags="${ADD_WRAPPER_LDFLAGS}" \
 73 --with-wrapper-libs="-lnsl -lpthread -lm" \
 74 --with-platform=optimized \
 75 --with-ugni=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem \
 76 --with-ugni-libdir=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem/lib64 \
 77
--with-ugni-includedir=/opt/cray/gni-headers/2.1-1.0400.3906.5.1.gem/include
\
 78 --with-xpmem=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem \
 79 --with-xpmem-libdir=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem/lib64 \
 80 --enable-mem-debug --enable-mem-profile --enable-debug-symbols
--enable-binaries \
 81 --enable-picky --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-cxx
--enable-mpi-cxx-seek \
 82 --without-slurm --with-memory-manager=ptmalloc2 \
 83 --with-pmi=/opt/cray/pmi/2.1.4-1.0000.8596.8.9.gem
--with-cray-pmi-ext \
 84
--enable-mca-no-build=maffinity-first_use,maffinity-libnuma,ess-cnos,filem-rsh,grpcomm-cnos,pml-dr
\
 85 ${ADD_COMPILER} \
 86 CPPFLAGS="${ADD_CPPFLAGS} -I${gniheaders}" \
 87 FFLAGS="${ADD_FFLAGS} -I${gniheaders}" \
 88 FCFLAGS="${ADD_FCFLAGS} -I/usr/include -I${gniheaders}" \
 89 CFLAGS="-I/usr/include -I${gniheaders}" \
 90 LDFLAGS="--static ${ADD_LDFLAGS} ${UGNILIBS} ${XPMEMLIBS}" \
 91 LIBS="${ADD_LIBS} -lpthread -lrt -lpthread -lm" | tee build.log

Any idea?

Bin WANG

On Mon, Mar 5, 2012 at 7:13 PM, Ralph Castain <rhc.openmpi_at_[hidden]> wrote:

> How did you attempt to start your job, and what does your configure line
> look like?
>
> Sent from my iPad
>
> On Mar 5, 2012, at 2:11 PM, bin Wang <bighead521_at_[hidden]> wrote:
>
> > Hello All,
> >
> > I'm trying to run the latest OpenMPI code on Jaguar.
> > (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
> > The configuration and compilation of OpenMPI were fine, and benchmark
> > was also successfully compiled. I tried to launch my program using mpirun
> > within an interactive job, but it failed immediately.
> >
> > Core dump file gave me the following information.
> > ====================Error Msg=========================
> > [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local
> > node in file ess_singleton_module.c at line 220
> >
> --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > ompi_mpi_init: orte_init failed
> > --> Returned value Unable to start a daemon on the local node (-127)
> instead of ORTE_SUCCESS
> >
> >
> --------------------------------------------------------------------------
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration33r
> environment
> > problems. This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> > ompi_mpi_init: orte_init failed
> > --> Returned "Unable to start a daemon on40he local node" (-127) instead
> of "Success" (0)
> >
> --------------------------------------------------------------------------
> > [jaguarpf-login2:15370] *** An error occurred in MPI_Init
> > [jaguarpf-login2:15370] *** reported by process [4294967295,42949No
> process In: Line: ?? PC: ??
> > [jaguarpf-login2:15370] *** on a NULL communicator
> > [jaguarpf-login2:15370] *** Unknown error
> > [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this
> communicator will now abort,
> > [jaguarpf-login2:15370] *** and potentially your MPI job)
> >
> --------------------------------------------------------------------------
> > An MPI process is aborting at a time when it cannot guarantee that all
> > of its peer processes in the job will be killed properly. You should
> > double check that everything has shut down cleanly.
> > Reason: Before MPI_INIT completed
> > Local host: jaguarpf-login2
> > PID: 15370
> >
> --------------------------------------------------------------------------
> > Program exited with code 01.
> > ====================Error Msg Over=====================
> >
> > There are several components under ess, but I don't know why and how the
> > singleton component was chosen.
> >
> > I hope someone could help me to compile and run openmpi successfully on
> Jaguar.
> >
> > Any comment and suggestion will be appreciated.
> >
> > Thanks,
> >
> > --Bin
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>