Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] can't run the code on Jaguar
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-03-06 22:24:07


Wow - that's the ugliest configure line I think I've ever seen :-/

I note you have a --with-platform in the middle of it, which is really
unusual. Normally, you would put all that stuff in a platform file if
that's what you were going to do. Note that anything in the platform file
will override any duplicates on the cmd line, not the other way around. So
you may not be building what you thought.

I also noticed that you had two conflicting --with-memory-manager options
specified, which isn't good.

There usually isn't any reason for that complex a configure - we do a
pretty good job of sensing the right thing to do. In this case, I believe
the problem is that you forgot to configure for alps support and configured
out cnos support, so there is nothing left that you can use on your system.

Take a look at contrib/platform/lanl/cray_xe6/debug-nopanasas for an
example platform file that, I believe, builds what you are seeking. I would
suggest copying and editing that one, and then configuring with just
--with-platform=<my-edited-version>

On Tue, Mar 6, 2012 at 3:28 PM, bin wang <bighead521_at_[hidden]> wrote:

> Hello Ralph,
>
> Thanks for your reply.
>
> In order to start my job, I tried the following two ways
> (1) configured/compiled open-mpi and compiled benchmark on head node.
> submitted a pbs job.
> (2) submitted an interactive job to redo config/compile on compute node.
> And then used "/path/to/mpicc -o hello hello_world.c" to compile the
> benchmark.
> used "/path/tp/mpirun -np 2 /path/to/hello" to run the job.
> Actually I also tried to run "/path/tp/mpirun -np 2 hostname" but got the
> same error.
>
> The configure line is pretty long.
>
> 67 $SRCDIR/configure \
> 68 --prefix=$PREFIX \
> 69 --enable-static --disable-shared --disable-dlopen
> --disable-pretty-print-stacktrace --disable-pty-support --disable-io-romio
> --enable-contrib-no-build=libnbc,vt --enable-debug \
> 70 --with-memory-manager=none --with-threads \
> 71 --without-tm \
> 72 --with-wrapper-ldflags="${ADD_WRAPPER_LDFLAGS}" \
> 73 --with-wrapper-libs="-lnsl -lpthread -lm" \
> 74 --with-platform=optimized \
> 75 --with-ugni=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem \
> 76 --with-ugni-libdir=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem/lib64 \
> 77
> --with-ugni-includedir=/opt/cray/gni-headers/2.1-1.0400.3906.5.1.gem/include
> \
> 78 --with-xpmem=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem \
> 79 --with-xpmem-libdir=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem/lib64 \
> 80 --enable-mem-debug --enable-mem-profile --enable-debug-symbols
> --enable-binaries \
> 81 --enable-picky --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-cxx
> --enable-mpi-cxx-seek \
> 82 --without-slurm --with-memory-manager=ptmalloc2 \
> 83 --with-pmi=/opt/cray/pmi/2.1.4-1.0000.8596.8.9.gem
> --with-cray-pmi-ext \
> 84
> --enable-mca-no-build=maffinity-first_use,maffinity-libnuma,ess-cnos,filem-rsh,grpcomm-cnos,pml-dr
> \
> 85 ${ADD_COMPILER} \
> 86 CPPFLAGS="${ADD_CPPFLAGS} -I${gniheaders}" \
> 87 FFLAGS="${ADD_FFLAGS} -I${gniheaders}" \
> 88 FCFLAGS="${ADD_FCFLAGS} -I/usr/include -I${gniheaders}" \
> 89 CFLAGS="-I/usr/include -I${gniheaders}" \
> 90 LDFLAGS="--static ${ADD_LDFLAGS} ${UGNILIBS} ${XPMEMLIBS}" \
> 91 LIBS="${ADD_LIBS} -lpthread -lrt -lpthread -lm" | tee build.log
>
> Any idea?
>
>
> Bin WANG
>
>
>
>
> On Mon, Mar 5, 2012 at 7:13 PM, Ralph Castain <rhc.openmpi_at_[hidden]>wrote:
>
>> How did you attempt to start your job, and what does your configure line
>> look like?
>>
>> Sent from my iPad
>>
>> On Mar 5, 2012, at 2:11 PM, bin Wang <bighead521_at_[hidden]> wrote:
>>
>> > Hello All,
>> >
>> > I'm trying to run the latest OpenMPI code on Jaguar.
>> > (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
>> > The configuration and compilation of OpenMPI were fine, and benchmark
>> > was also successfully compiled. I tried to launch my program using
>> mpirun
>> > within an interactive job, but it failed immediately.
>> >
>> > Core dump file gave me the following information.
>> > ====================Error Msg=========================
>> > [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
>> start a daemon on the local
>> > node in file ess_singleton_module.c at line 220
>> >
>> --------------------------------------------------------------------------
>> > It looks like orte_init failed for some reason; your parallel process is
>> > likely to abort. There are many reasons that a parallel process can
>> > fail during orte_init; some of which are due to configuration or
>> > environment problems. This failure appears to be an internal failure;
>> > here's some additional information (which may only be relevant to an
>> > Open MPI developer):
>> > ompi_mpi_init: orte_init failed
>> > --> Returned value Unable to start a daemon on the local node (-127)
>> instead of ORTE_SUCCESS
>> >
>> >
>> --------------------------------------------------------------------------
>> > It looks like MPI_INIT failed for some reason; your parallel process is
>> > likely to abort. There are many reasons that a parallel process can
>> > fail during MPI_INIT; some of which are due to configuration33r
>> environment
>> > problems. This failure appears to be an internal failure; here's some
>> > additional information (which may only be relevant to an Open MPI
>> > developer):
>> > ompi_mpi_init: orte_init failed
>> > --> Returned "Unable to start a daemon on40he local node" (-127)
>> instead of "Success" (0)
>> >
>> --------------------------------------------------------------------------
>> > [jaguarpf-login2:15370] *** An error occurred in MPI_Init
>> > [jaguarpf-login2:15370] *** reported by process [4294967295,42949No
>> process In: Line: ?? PC: ??
>> > [jaguarpf-login2:15370] *** on a NULL communicator
>> > [jaguarpf-login2:15370] *** Unknown error
>> > [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this
>> communicator will now abort,
>> > [jaguarpf-login2:15370] *** and potentially your MPI job)
>> >
>> --------------------------------------------------------------------------
>> > An MPI process is aborting at a time when it cannot guarantee that all
>> > of its peer processes in the job will be killed properly. You should
>> > double check that everything has shut down cleanly.
>> > Reason: Before MPI_INIT completed
>> > Local host: jaguarpf-login2
>> > PID: 15370
>> >
>> --------------------------------------------------------------------------
>> > Program exited with code 01.
>> > ====================Error Msg Over=====================
>> >
>> > There are several components under ess, but I don't know why and how the
>> > singleton component was chosen.
>> >
>> > I hope someone could help me to compile and run openmpi successfully on
>> Jaguar.
>> >
>> > Any comment and suggestion will be appreciated.
>> >
>> > Thanks,
>> >
>> > --Bin
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>