Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Return code and error message problems
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-03-25 08:46:26


Interesting! I was running it on odin last night until around 11pm your time
without problems.

I'll take a look....

On 3/25/08 6:35 AM, "Tim Prins" <tprins_at_[hidden]> wrote:

> Hi,
>
> Something went wrong last night and all our MTT tests had the following
> output:
> [odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file
> base/plm_base_launch_support.c at line 161
> --------------------------------------------------------------------------
> mpirun was unable to start the specified application as it encountered
> an error.
> More information may be available above.
> --------------------------------------------------------------------------
>
> I have not tracked down what caused this, but the more immediate problem
> is that after giving this error mpirun returned '0' instead of a more
> sane error value.
>
>
>
> Also, when running the test 'orte/test/mpi/abort' I get the error output:
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 17822 on
> node odin013 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> Which is wrong, it should be saying that the process was aborted. It
> looks like somehow the job state is being set to
> ORTE_JOB_STATE_ABORTED_WO_SYNC instead of ORTE_JOB_STATE_ABORTED.
>
> Thanks,
>
> Tim
>
>