Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mca:base:select:( ess) No component selected!
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-09-24 14:36:53


So this is a singleton comm_spawn scenario, that requires you specify
a launch_agent to execute? Just trying to ensure I understand.

First, let me ensure we have a common understanding of what
orte_launch_agent does. Basically, that param stipulates the command
to be used in place of "orted" - it doesn't substitute for "ssh". So
if you set -mca orte_launch_agent foo, what will happen is: "ssh
nodename foo" instead of "ssh nodename orted".

The intent was to provide a way to do things like run valgrind on the
orted itself. So you could do -mca orte_launch_agent "valgrind orted",
and we would dutifully run "ssh nodename valrind orted".

Or if you wanted to write your own orted (e.g., bar-orted), you could
substitute it for our "orted".

Or if you wanted to set mca params solely to be seen on the backend
nodes/procs, you could set -mca orte_launch_agent "orted -mca foo
bar", and we would launch "ssh nodename orted -mca foo bar". This
allows us to set mca params without having mpirun see them - helps us
to look at debug output, for example, from only the backend procs.

If what you need to do is set something in the environment for the
orted, there are certain cmd line options that will do that for you -
orte_launch_agent may or may not be a good method.

Perhaps it would help if you could tell me exactly what you wanted to
have orte_launch_agent actually do?

Thanks
Ralph

On Sep 24, 2008, at 12:22 PM, Will Portnoy wrote:

> Sorry for the miscommunication: The processes are started by my
> program with MPI_Comm_spawn, so there was no mpirun involved.
>
> If you can suggest a test program I can use with mpirun to validate my
> openmpi environment and install, that would probably produce the
> output you would like to see.
>
> But I'm not sure that will make it clear how the file pointed to by
> "orte_launch_agent" in "mca-params.conf" should be written to setup an
> environment and start orted.
>
> Will
>
> On Wed, Sep 24, 2008 at 2:17 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> Afraid I am confused. This was the entire output from the job?? If
>> so, then
>> that means mpirun itself wasn't able to find a launch environment
>> it could
>> use, so you never got to the point of actually launching an orted.
>>
>> Do you have ssh in your path? My best immediate guess is that you
>> don't, and
>> that mpirun therefore doesn't see anything it can use to launch a
>> job. We
>> have discussed internally that we need to improve that error
>> message - could
>> be this is another case emphasizing that point.
>>
>> 1.3 is fine to use - still patching some bugs, but nothing that
>> should
>> impact this issue.
>>
>> Ralph
>>
>> On Sep 24, 2008, at 12:11 PM, Will Portnoy wrote:
>>
>>> That was the output with plm_base_verbose set to 99 - it's the same
>>> output with 1.
>>>
>>> Yes, I'd like to use ssh.
>>>
>>> orted wasn't starting properly with orte_launch_agent (which was
>>> needed because my environment on the target machine wasn't set
>>> up), so
>>> that's why I thought I would try it directly on the command line on
>>> localhost. I thought this was a simpler case: to verify that orted
>>> could find all of its necessary components without the complexity of
>>> everything else I'm doing.
>>>
>>> If I needed to use orte_launch_agent, how should I pass the
>>> necessary
>>> parameters to start orted after I set up my environment?
>>>
>>> Am I better off using trunk over 1.3?
>>>
>>> thank you,
>>>
>>> Will
>>>
>>> On Wed, Sep 24, 2008 at 2:01 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>
>>>> Could you rerun that with -mca plm_base_verbose 1? What
>>>> environment are
>>>> you
>>>> in - I assume rsh/ssh?
>>>>
>>>> I would like to see the cmd line being used to launch the orted.
>>>> What
>>>> this
>>>> indicates is that we are not getting the cmd line correct. Could
>>>> just be
>>>> that some patch in the trunk didn't get completely applied to the
>>>> 1.3
>>>> branch.
>>>>
>>>> BTW: you probably can't run orted directly off of the cmd line.
>>>> It likely
>>>> needs some cmd line params to get critical info.
>>>>
>>>> Ralph
>>>>
>>>> On Sep 24, 2008, at 9:47 AM, Will Portnoy wrote:
>>>>
>>>>> I'm trying to use MPI_Comm_Spawn with MPI_Info's host key to spawn
>>>>> processes from a process not started with mpirun. This works
>>>>> with the
>>>>> host key set to the localhost's hostname, but it does not work
>>>>> when I
>>>>> use other hosts.
>>>>>
>>>>> I'm using version 1.3a1r19602. I need to use orte_launch_agent
>>>>> to set
>>>>> up my environment a bit before orted is started, but it fails with
>>>>> errors listed below.
>>>>>
>>>>> When I try to run orted directly on the command line with some
>>>>> of the
>>>>> verbosity flags turned to "11", I receive the same messages.
>>>>>
>>>>> Does anybody have any suggestions?
>>>>>
>>>>> thank you,
>>>>>
>>>>> Will
>>>>>
>>>>>
>>>>> [fqdn:24761] mca: base: components_open: Looking for ess
>>>>> components
>>>>> [fqdn:24761] mca: base: components_open: opening ess components
>>>>> [fqdn:24761] mca: base: components_open: found loaded component
>>>>> env
>>>>> [fqdn:24761] mca: base: components_open: component env has no
>>>>> register
>>>>> function
>>>>> [fqdn:24761] mca: base: components_open: component env open
>>>>> function
>>>>> successful
>>>>> [fqdn:24761] mca: base: components_open: found loaded component
>>>>> hnp
>>>>> [fqdn:24761] mca: base: components_open: component hnp has no
>>>>> register
>>>>> function
>>>>> [fqdn:24761] mca: base: components_open: component hnp open
>>>>> function
>>>>> successful
>>>>> [fqdn:24761] mca: base: components_open: found loaded component
>>>>> singleton
>>>>> [fqdn:24761] mca: base: components_open: component singleton has
>>>>> no
>>>>> register function
>>>>> [fqdn:24761] mca: base: components_open: component singleton open
>>>>> function successful
>>>>> [fqdn:24761] mca: base: components_open: found loaded component
>>>>> slurm
>>>>> [fqdn:24761] mca: base: components_open: component slurm has no
>>>>> register function
>>>>> [fqdn:24761] mca: base: components_open: component slurm open
>>>>> function
>>>>> successful
>>>>> [fqdn:24761] mca: base: components_open: found loaded component
>>>>> tool
>>>>> [fqdn:24761] mca: base: components_open: component tool has no
>>>>> register
>>>>> function
>>>>> [fqdn:24761] mca: base: components_open: component tool open
>>>>> function
>>>>> successful
>>>>> [fqdn:24761] mca:base:select: Auto-selecting ess components
>>>>> [fqdn:24761] mca:base:select:( ess) Querying component [env]
>>>>> [fqdn:24761] mca:base:select:( ess) Skipping component [env].
>>>>> Query
>>>>> failed to return a module
>>>>> [fqdn:24761] mca:base:select:( ess) Querying component [hnp]
>>>>> [fqdn:24761] mca:base:select:( ess) Skipping component [hnp].
>>>>> Query
>>>>> failed to return a module
>>>>> [fqdn:24761] mca:base:select:( ess) Querying component
>>>>> [singleton]
>>>>> [fqdn:24761] mca:base:select:( ess) Skipping component
>>>>> [singleton].
>>>>> Query failed to return a module
>>>>> [fqdn:24761] mca:base:select:( ess) Querying component [slurm]
>>>>> [fqdn:24761] mca:base:select:( ess) Skipping component [slurm].
>>>>> Query
>>>>> failed to return a module
>>>>> [fqdn:24761] mca:base:select:( ess) Querying component [tool]
>>>>> [fqdn:24761] mca:base:select:( ess) Skipping component [tool].
>>>>> Query
>>>>> failed to return a module
>>>>> [fqdn:24761] mca:base:select:( ess) No component selected!
>>>>> [fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>>>> runtime/orte_init.c at line 125
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like orte_init failed for some reason; your parallel
>>>>> process is
>>>>> likely to abort. There are many reasons that a parallel process
>>>>> can
>>>>> fail during orte_init; some of which are due to configuration or
>>>>> environment problems. This failure appears to be an internal
>>>>> failure;
>>>>> here's some additional information (which may only be relevant
>>>>> to an
>>>>> Open MPI developer):
>>>>>
>>>>> orte_ess_base_select failed
>>>>> --> Returned value Not found (-13) instead of ORTE_SUCCESS
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>>>> orted/orted_main.c at line 315
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users