Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with 1.3.2 - need tips on debugging
From: Jeff Layton (laytonjb_at_[hidden])
Date: 2009-05-29 14:52:46


I've got some more information (after rebuilding Open MPI and the
application a few times). I put

-mca mpi_show_mca_params enviro

in my mpirun line to get some of the parameter information back.
I get the following information back (warning - it's long).

--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 199
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: compute-0-0.local
Framework: ras
Component: proxy
--------------------------------------------------------------------------
[compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 199
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ras_base_open failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ras_base_open failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ras_base_open failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ras_base_open failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ras_base_open failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ras_base_open failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ras_base_open failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 323
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 323
[compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 381
[compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 143
[compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node (-128)
instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 381
[compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 143
[compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 381
[compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 143
[compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file runtime/orte_init.c at line 132
[compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 381
[compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 143
[compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file runtime/orte_init.c at line 132
[compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 381
[compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 143
[compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file runtime/orte_init.c at line 132
[compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node (-128)
instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128)
instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[compute-0-0.local:1556] Abort before MPI_INIT completed successfully;
not able to guarantee that all other processes were killed!
[compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 381
[compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 143
[compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file runtime/orte_init.c at line 132
[compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 381
[compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file ess_singleton_module.c at line 143
[compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
start a daemon on the local node in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node (-128)
instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;

(and on and on).

Does anyone have any ideas? Google let me down on this one.

TIA!

Jeff

> Good morning,
>
> I just built 1.3.2 on a ROCKS 4.something system. I built my code
> (CFL3D) with the Intel 10.1 compilers. I also linked in the
> OpenMPI libs and the Intel libraries to make sure I had the paths
> correct. When I try running my code, I get the following,
>
>
> error: executing task of job 2951 failed: execution daemon on host
> "compute-2-3.local" didn't accept task
> --------------------------------------------------------------------------
>
> A daemon (pid 12015) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
>
> mpirun: clean termination accomplished
>
>
>
> Everything seems correct. I checked that the mpirun was correct
> and the binary has the correct libraries (checked using ldd).
>
> Can anyone tell me what the "status 1" means? Any tips on debugging
> the problem?
>
> Thanks!
>
> Jeff
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>