Hello Ralph,
Thanks for your reply.
In order to start my job, I tried the following two ways
(1) configured/compiled open-mpi and compiled benchmark on head node.
submitted a pbs job.
(2) submitted an interactive job to redo config/compile on compute node.
And then used "/path/to/mpicc -o hello hello_world.c" to compile the
benchmark.
used "/path/tp/mpirun -np 2 /path/to/hello" to run the job.
Actually I also tried to run "/path/tp/mpirun -np 2 hostname" but got the
same error.
The configure line is pretty long.
67 $SRCDIR/configure \
68 --prefix=$PREFIX \
69 --enable-static --disable-shared --disable-dlopen
--disable-pretty-print-stacktrace --disable-pty-support --disable-io-romio
--enable-contrib-no-build=libnbc,vt --enable-debug \
70 --with-memory-manager=none --with-threads \
71 --without-tm \
72 --with-wrapper-ldflags="${ADD_WRAPPER_LDFLAGS}" \
73 --with-wrapper-libs="-lnsl -lpthread -lm" \
74 --with-platform=optimized \
75 --with-ugni=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem \
76 --with-ugni-libdir=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem/lib64 \
77
--with-ugni-includedir=/opt/cray/gni-headers/2.1-1.0400.3906.5.1.gem/include
\
78 --with-xpmem=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem \
79 --with-xpmem-libdir=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem/lib64 \
80 --enable-mem-debug --enable-mem-profile --enable-debug-symbols
--enable-binaries \
81 --enable-picky --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-cxx
--enable-mpi-cxx-seek \
82 --without-slurm --with-memory-manager=ptmalloc2 \
83 --with-pmi=/opt/cray/pmi/2.1.4-1.0000.8596.8.9.gem
--with-cray-pmi-ext \
84
--enable-mca-no-build=maffinity-first_use,maffinity-libnuma,ess-cnos,filem-rsh,grpcomm-cnos,pml-dr
\
85 ${ADD_COMPILER} \
86 CPPFLAGS="${ADD_CPPFLAGS} -I${gniheaders}" \
87 FFLAGS="${ADD_FFLAGS} -I${gniheaders}" \
88 FCFLAGS="${ADD_FCFLAGS} -I/usr/include -I${gniheaders}" \
89 CFLAGS="-I/usr/include -I${gniheaders}" \
90 LDFLAGS="--static ${ADD_LDFLAGS} ${UGNILIBS} ${XPMEMLIBS}" \
91 LIBS="${ADD_LIBS} -lpthread -lrt -lpthread -lm" | tee build.log
Any idea?
Bin WANG
On Mon, Mar 5, 2012 at 7:13 PM, Ralph Castain <rhc.openmpi_at_[hidden]> wrote:
> How did you attempt to start your job, and what does your configure line
> look like?
>
> Sent from my iPad
>
> On Mar 5, 2012, at 2:11 PM, bin Wang <bighead521_at_[hidden]> wrote:
>
> > Hello All,
> >
> > I'm trying to run the latest OpenMPI code on Jaguar.
> > (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
> > The configuration and compilation of OpenMPI were fine, and benchmark
> > was also successfully compiled. I tried to launch my program using mpirun
> > within an interactive job, but it failed immediately.
> >
> > Core dump file gave me the following information.
> > ====================Error Msg=========================
> > [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local
> > node in file ess_singleton_module.c at line 220
> >
> --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > ompi_mpi_init: orte_init failed
> > --> Returned value Unable to start a daemon on the local node (-127)
> instead of ORTE_SUCCESS
> >
> >
> --------------------------------------------------------------------------
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration33r
> environment
> > problems. This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> > ompi_mpi_init: orte_init failed
> > --> Returned "Unable to start a daemon on40he local node" (-127) instead
> of "Success" (0)
> >
> --------------------------------------------------------------------------
> > [jaguarpf-login2:15370] *** An error occurred in MPI_Init
> > [jaguarpf-login2:15370] *** reported by process [4294967295,42949No
> process In: Line: ?? PC: ??
> > [jaguarpf-login2:15370] *** on a NULL communicator
> > [jaguarpf-login2:15370] *** Unknown error
> > [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this
> communicator will now abort,
> > [jaguarpf-login2:15370] *** and potentially your MPI job)
> >
> --------------------------------------------------------------------------
> > An MPI process is aborting at a time when it cannot guarantee that all
> > of its peer processes in the job will be killed properly. You should
> > double check that everything has shut down cleanly.
> > Reason: Before MPI_INIT completed
> > Local host: jaguarpf-login2
> > PID: 15370
> >
> --------------------------------------------------------------------------
> > Program exited with code 01.
> > ====================Error Msg Over=====================
> >
> > There are several components under ess, but I don't know why and how the
> > singleton component was chosen.
> >
> > I hope someone could help me to compile and run openmpi successfully on
> Jaguar.
> >
> > Any comment and suggestion will be appreciated.
> >
> > Thanks,
> >
> > --Bin
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
|