Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Problem with 1.3.2 - need tips on debugging
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-05-29 18:40:10


You have version confusion somewhere - the error message indicates that
mpirun is looking for a component that only exists in the 1.2.x series, not
in 1.3.x. Check that your LD_LIBRARY_PATH is pointing to the 1.3.2 location,
along with your PATH.

On Fri, May 29, 2009 at 12:52 PM, Jeff Layton <laytonjb_at_[hidden]> wrote:

> I've got some more information (after rebuilding Open MPI and the
> application a few times). I put
>
> -mca mpi_show_mca_params enviro
>
>
> in my mpirun line to get some of the parameter information back.
> I get the following information back (warning - it's long).
>
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded). Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host: compute-0-0.local
> Framework: ras
> Component: proxy
> --------------------------------------------------------------------------
> [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
> ess_hnp_module.c at line 199
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ras_base_open failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01566] [[58305,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01564] [[58307,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01565] [[58306,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01562] [[58309,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01560] [[58311,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01563] [[58308,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01559] [[58312,0],0] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 323
> [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01556] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Unable to start a daemon on the local node (-128)
> instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01551] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01552] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01554] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01555] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Unable to start a daemon on the local node (-128)
> instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "Unable to start a daemon on the local node" (-128) instead
> of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [compute-0-0.local:1556] Abort before MPI_INIT completed successfully; not
> able to guarantee that all other processes were killed!
> [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01557] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 381
> [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file ess_singleton_module.c at line 143
> [compute-0-0.local:01558] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local node in file runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Unable to start a daemon on the local node (-128)
> instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
>
> (and on and on).
>
> Does anyone have any ideas? Google let me down on this one.
>
> TIA!
>
> Jeff
>
>
>
> Good morning,
>>
>> I just built 1.3.2 on a ROCKS 4.something system. I built my code
>> (CFL3D) with the Intel 10.1 compilers. I also linked in the
>> OpenMPI libs and the Intel libraries to make sure I had the paths
>> correct. When I try running my code, I get the following,
>>
>>
>> error: executing task of job 2951 failed: execution daemon on host
>> "compute-2-3.local" didn't accept task
>> --------------------------------------------------------------------------
>>
>> A daemon (pid 12015) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>>
>> mpirun: clean termination accomplished
>>
>>
>>
>> Everything seems correct. I checked that the mpirun was correct
>> and the binary has the correct libraries (checked using ldd).
>>
>> Can anyone tell me what the "status 1" means? Any tips on debugging
>> the problem?
>>
>> Thanks!
>>
>> Jeff
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>