Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] can't run the code on Jaguar
From: Ralph Castain (rhc.openmpi_at_[hidden])
Date: 2012-03-05 20:13:46


How did you attempt to start your job, and what does your configure line look like?

Sent from my iPad

On Mar 5, 2012, at 2:11 PM, bin Wang <bighead521_at_[hidden]> wrote:

> Hello All,
>
> I'm trying to run the latest OpenMPI code on Jaguar.
> (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
> The configuration and compilation of OpenMPI were fine, and benchmark
> was also successfully compiled. I tried to launch my program using mpirun
> within an interactive job, but it failed immediately.
>
> Core dump file gave me the following information.
> ====================Error Msg=========================
> [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local
> node in file ess_singleton_module.c at line 220
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> ompi_mpi_init: orte_init failed
> --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
>
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration33r environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> ompi_mpi_init: orte_init failed
> --> Returned "Unable to start a daemon on40he local node" (-127) instead of "Success" (0)
> --------------------------------------------------------------------------
> [jaguarpf-login2:15370] *** An error occurred in MPI_Init
> [jaguarpf-login2:15370] *** reported by process [4294967295,42949No process In: Line: ?? PC: ??
> [jaguarpf-login2:15370] *** on a NULL communicator
> [jaguarpf-login2:15370] *** Unknown error
> [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [jaguarpf-login2:15370] *** and potentially your MPI job)
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly. You should
> double check that everything has shut down cleanly.
> Reason: Before MPI_INIT completed
> Local host: jaguarpf-login2
> PID: 15370
> --------------------------------------------------------------------------
> Program exited with code 01.
> ====================Error Msg Over=====================
>
> There are several components under ess, but I don't know why and how the
> singleton component was chosen.
>
> I hope someone could help me to compile and run openmpi successfully on Jaguar.
>
> Any comment and suggestion will be appreciated.
>
> Thanks,
>
> --Bin
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users