Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with 1.3.2 - need tips on debugging
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-05-29 18:40:10


You have version confusion somewhere - the error message indicates that
mpirun is looking for a component that only exists in the 1.2.x series, not
in 1.3.x. Check that your LD_LIBRARY_PATH is pointing to the 1.3.2 location,
along with your PATH.

On Fri, May 29, 2009 at 12:52 PM, Jeff Layton <laytonjb_at_[hidden]> wrote:

> I've got some more information (after rebuilding Open MPI and the
> application a few times). I put
>
> -mca mpi_show_mca_params enviro
>
>
> in my mpirun line to get some of the parameter information back.
> I get the following information back (warning - it's long).
>
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Unable to start a daemon on the local node (-128)
> instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Unable to start a daemon on the local node (-128)
> instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "Unable to start a daemon on the local node" (-128) instead
> of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [compute-0-0.local:1556] Abort before MPI_INIT completed successfully; not
> able to guarantee that all other processes were killed!
> [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Unable to start a daemon on the local node (-128)
> instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
>
> (and on and on).
>
> Does anyone have any ideas? Google let me down on this one.
>
> TIA!
>
> Jeff
>
>
>
> Good morning,
>>
>> I just built 1.3.2 on a ROCKS 4.something system. I built my code
>> (CFL3D) with the Intel 10.1 compilers. I also linked in the
>> OpenMPI libs and the Intel libraries to make sure I had the paths
>> correct. When I try running my code, I get the following,
>>
>>
>> error: executing task of job 2951 failed: execution daemon on host
>> "compute-2-3.local" didn't accept task
>> --------------------------------------------------------------------------
>>
>> A daemon (pid 12015) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>>
>> mpirun: clean termination accomplished
>>
>>
>>
>> Everything seems correct. I checked that the mpirun was correct
>> and the binary has the correct libraries (checked using ldd).
>>
>> Can anyone tell me what the "status 1" means? Any tips on debugging
>> the problem?
>>
>> Thanks!
>>
>> Jeff
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>