Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Return code and error message problems
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-03-25 08:46:26


Interesting! I was running it on odin last night until around 11pm your time
without problems.

I'll take a look....

On 3/25/08 6:35 AM, "Tim Prins" <tprins_at_[hidden]> wrote:

> Hi,
>
> Something went wrong last night and all our MTT tests had the following
> output:
> [odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file
> base/plm_base_launch_support.c at line 161
> --------------------------------------------------------------------------
> mpirun was unable to start the specified application as it encountered
> an error.
> More information may be available above.
> --------------------------------------------------------------------------
>
> I have not tracked down what caused this, but the more immediate problem
> is that after giving this error mpirun returned '0' instead of a more
> sane error value.
>
>
>
> Also, when running the test 'orte/test/mpi/abort' I get the error output:
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 17822 on
> node odin013 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> Which is wrong, it should be saying that the process was aborted. It
> looks like somehow the job state is being set to
> ORTE_JOB_STATE_ABORTED_WO_SYNC instead of ORTE_JOB_STATE_ABORTED.
>
> Thanks,
>
> Tim
>
>