Hello Ralph,

Thanks for your reply.

In order to start my job, I tried the following two ways
(1) configured/compiled open-mpi and compiled benchmark on head node.
      submitted a pbs job.
(2) submitted an interactive job to redo config/compile on compute node.
      And then used "/path/to/mpicc -o hello hello_world.c" to compile the benchmark.
      used "/path/tp/mpirun -np 2 /path/to/hello" to run the job.
Actually I also tried to run "/path/tp/mpirun -np 2 hostname" but got the same error.

The configure line is pretty long.

 67 $SRCDIR/configure \
 68    --prefix=$PREFIX \
 69    --enable-static --disable-shared --disable-dlopen --disable-pretty-print-stacktrace --disable-pty-support --disable-io-romio --enable-contrib-no-build=libnbc,vt --enable-debug \
 70    --with-memory-manager=none --with-threads \
 71    --without-tm \
 72    --with-wrapper-ldflags="${ADD_WRAPPER_LDFLAGS}" \
 73    --with-wrapper-libs="-lnsl -lpthread -lm" \
 74    --with-platform=optimized \
 75    --with-ugni=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem \
 76    --with-ugni-libdir=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem/lib64  \
 77    --with-ugni-includedir=/opt/cray/gni-headers/2.1-1.0400.3906.5.1.gem/include \
 78    --with-xpmem=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem \
 79    --with-xpmem-libdir=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem/lib64 \
 80    --enable-mem-debug --enable-mem-profile --enable-debug-symbols --enable-binaries \
 81    --enable-picky --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-cxx --enable-mpi-cxx-seek \
 82    --without-slurm --with-memory-manager=ptmalloc2 \
 83    --with-pmi=/opt/cray/pmi/2.1.4-1.0000.8596.8.9.gem  --with-cray-pmi-ext \
 84    --enable-mca-no-build=maffinity-first_use,maffinity-libnuma,ess-cnos,filem-rsh,grpcomm-cnos,pml-dr \
 85    ${ADD_COMPILER} \
 86    CPPFLAGS="${ADD_CPPFLAGS} -I${gniheaders}" \
 87    FFLAGS="${ADD_FFLAGS} -I${gniheaders}" \
 88    FCFLAGS="${ADD_FCFLAGS} -I/usr/include -I${gniheaders}" \
 89    CFLAGS="-I/usr/include -I${gniheaders}" \
 90    LDFLAGS="--static ${ADD_LDFLAGS} ${UGNILIBS} ${XPMEMLIBS}" \
 91    LIBS="${ADD_LIBS} -lpthread -lrt -lpthread -lm" | tee build.log

Any idea?


Bin WANG



On Mon, Mar 5, 2012 at 7:13 PM, Ralph Castain <rhc.openmpi@gmail.com> wrote:
How did you attempt to start your job, and what does your configure line look like?

Sent from my iPad

On Mar 5, 2012, at 2:11 PM, bin Wang <bighead521@gmail.com> wrote:

> Hello All,
>
> I'm trying to run the latest OpenMPI code on Jaguar.
> (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
> The configuration and compilation of OpenMPI were fine, and benchmark
> was also successfully compiled. I tried to launch my program using mpirun
> within an interactive job, but it failed immediately.
>
> Core dump file gave me the following information.
> ====================Error Msg=========================
> [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local
> node in file ess_singleton_module.c at line 220
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> ompi_mpi_init: orte_init failed
> --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
>
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration33r environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> ompi_mpi_init: orte_init failed
> --> Returned "Unable to start a daemon on40he local node" (-127) instead of "Success" (0)
> --------------------------------------------------------------------------
> [jaguarpf-login2:15370] *** An error occurred in MPI_Init
> [jaguarpf-login2:15370] *** reported by process [4294967295,42949No process In: Line: ??   PC: ??
> [jaguarpf-login2:15370] *** on a NULL communicator
> [jaguarpf-login2:15370] *** Unknown error
> [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [jaguarpf-login2:15370] *** and potentially your MPI job)
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly.  You should
> double check that everything has shut down cleanly.
> Reason:     Before MPI_INIT completed
> Local host: jaguarpf-login2
> PID:        15370
> --------------------------------------------------------------------------
> Program exited with code 01.
> ====================Error Msg Over=====================
>
> There are several components under ess, but I don't know why and how the
> singleton component was chosen.
>
> I hope someone could help me to compile and run openmpi successfully on Jaguar.
>
> Any comment and suggestion will be appreciated.
>
> Thanks,
>
> --Bin
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users