Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 'orte_ess_base_select failed'
From: Russell McQueeney (justru_at_[hidden])
Date: 2009-03-27 18:29:23


Jeff Squyres wrote:
> Hmm -- puzzling -- the error file you sent shows the following:
>
> bash: /opt/openmpi/orted: No such file or directory
>
> But that shouldn't happen; according to your config.log, you installed
> with a prefix of /opt/openmpi, so Open MPI should be looking for orted
> in /opt/openmpi/bin/orted.
>
> You said that the command was
>
>> command = mpirun --hostfile hostfile -np 2 echo `uname -a`
>
> Is there any chance that you ran with mpirun's absolute filename, such
> as:
>
> /opt/openmpi/bin/mpirun --hostfile hostfile -np 2 echo `uname -a`
>
> Or do you have any aliases involved? I can't imagine how you're
> getting that error message -- Open MPI should never use a full path
> name for orted unless you specified --prefix on the mpirun command
> line (which you didn't), or youused a full path name for mpirun (which
> it looks like you didn't, and even if you did use
> /opt/openmpi/bin/mpirun, it should use that path to look for
> /opt/openmpi/bin/orted on the other node). Otherwise, Open MPI relies
> on the PATH set in your shell startup files on remote nodes to find
> the orted.
>
> This is very odd -- can you look at the exact command that is being
> executed on the remote node?
>
>
> On Mar 27, 2009, at 12:24 PM, Russell McQueeney wrote:
>
>> command = mpirun --hostfile hostfile -np 2 echo `uname -a`
>> PATH = ...:/opt/openmpi/bin
>> LD_LIBRARY_PATH = /opt/openmpi/lib
>> no MCA parameters used
>>
>> I set up the default shell to bash, and put some echo's in .bash_profile
>> and .bashrc, and when i run the mpirun command, i see those echoes, but
>> then it stops, and the job is never completed
>>
>> Ralph Castain wrote:
>> > Could you please send the info shown here:
>> >
>> > http://www.open-mpi.org/community/help/
>> >
>> > If the ess is failing, then we don't recognize the environment.
>> > Probably an issue with how it is configured vs being run.
>> >
>> > Thanks
>> > Ralph
>> >
>> > On Mar 26, 2009, at 3:42 PM, Russell McQueeney wrote:
>> >
>> >> I installed OpenMPI 1.3.1, and whenever I or mpirun try to start
>> >> orted on any of the machines, it shows that message, and
>> >> --> Returned value Not found (-13) instead of ORTE-SUCCESS
>> >> Is there anything obvious that I missed?
>> >> My machines are Intel x86-32, running fedora (10 and 2)
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden] <mailto:users_at_[hidden]>
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> ------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> <config.log.bz2><ompi_info.bz2><orted_errors.bz2><ifconfig.bz2><ATT7963694.txt>
>>
>
>
Oops. I just did `/opt/openmpi/orted 2>orted_erros ; bzip2
orted_errors` and didn't check it before I atached it. What ends up
happening is ^C kill mpirun on the head node, and all the other nodes
have a zombie, nonresponsive 'orted' process, which I have to kill
manually. Interestingly enough, no matter what environment variables I
set, and no matter which machine, when I try to run `orted` or
`/opt/openmpi/bin/orted`, I get the exact same error. I have attached
the real orted errors file here. The reason that bash was whining was
an incorrect syntax on the stderr redierct, `orted 2> orted_errors`
instead of the correct version; `orted 2>orted_errors`