Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Dino Rossegger (dino.rossegger_at_[hidden])
Date: 2007-10-01 15:56:16


Hi again,

Yes the error output is the same:
root_at_sun:~# mpirun --hostfile hostfile main
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1164
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[sun:23748] ERROR: A daemon on node saturn failed to start as expected.
[sun:23748] ERROR: There may be more information available from
[sun:23748] ERROR: the remote shell (see above).
[sun:23748] ERROR: The daemon exited unexpectedly with status 255.
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[sun:23748] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1196
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job.
Returned value Timeout instead of ORTE_SUCCESS.

--------------------------------------------------------------------------

I wrote the following to my .ssh/environment (on all machines)
LD_LIBRARY_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib;

PATH=$PATH:/usr/local/lib;

export LD_LIBRARY_PATH;
export PATH;

and added the statement you told me to the ssd_config (on all machines):
PermitUserEnvironment yes

And it seems to me that the pathes are correct now.

My shell is bash (/bin/bash)

When running locate orted (to find out where exactly my openmpi
installation is (compilation defaults) i saw that, on sun there was a
/usr/bin/orted while there wasn't one on saturn.
I deleted /usr/bin/orted on sun and tried again with the option --prefix
 /usr/local/ (which seems to be my installation directory) but it
didn't work (same error).

Is there a script or anything like that with which I can uninstall
openmpi, because i'll might try a new compilation to /opt/openmpi since
it doesn't look like I would be able to solve the problem.

jody schrieb:
> Now that the PATHs seem to be set correctly for
> ssh i don't know what the problem could be.
>
> Is the error message still the same on as in the first mail?
> Did you do the envorpnment/sshd_config on both machines?
> What shell are you using?
>
> On other test you could make is to start your application
> with the --prefix option:
>
> $mpirun -np 2 --prefix /opt/openmpi -H sun,saturn ./main
>
> (assuming your Open MPI installation lies in /opt/openmpi
> on both machines)
>
>
> Jody
>
> On 10/1/07, Dino Rossegger <dino.rossegger_at_[hidden]> wrote:
>> Hi Jodi,
>> did the steps as you said, but it didn't work for me.
>> I set LD_LIBRARY_PATH in /etc/environment and ~/.shh/environment and
>> made the changes to sshd_config.
>>
>> But this all didn't solve my problem, although the pahts seemed to be
>> set correctly (judging what ssh saturn `printenv >> test` says). I also
>> restarted the ssh server, the error is the same.
>>
>> Hope you can help me out here and thanks for your help so far
>> dino
>>
>> jody schrieb:
>>> Dino -
>>> I had a similar problem.
>>> I was only able to solve it by setting PATH and LS_LIBRARY_PATH
>>> in the file ~/ssh/environment on the client and setting
>>> PermitUserEnvironment yes
>>> in /etc/ssh/sshd_config on the server (for this you need root
>>> prioviledge though)
>>>
>>> To be on the safe side, i did both on all my nodes
>>>
>>> Jody
>>>
>>> On 9/27/07, Dino Rossegger <dino.rossegger_at_[hidden]> wrote:
>>>> Hi Jody,
>>>>
>>>> Thanks for your help, it really is the case that either in PATH nor in
>>>> LD_LIBRARY_PATH the path to the libs is set correctly. I'll try out,
>>>> hope it works.
>>>>
>>>> jody schrieb:
>>>>> Hi Dino
>>>>>
>>>>> Try
>>>>> ssh saturn printenv | grep PATH
>>>>> >from your host sun to see what your environment variables are when
>>>>> ssh is run without a shell.
>>>>>
>>>>>
>>>>> On 9/27/07, Dino Rossegger <dino.rossegger_at_[hidden]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have a problem running a simple programm mpihello.cpp.
>>>>>>
>>>>>> Here is a excerp of the error and the command
>>>>>> root_at_sun:~# mpirun -H sun,saturn main
>>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>>>> base/pls_base_orted_cmds.c at line 275
>>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>>>> line 1164
>>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>>>>>> [sun:25213] ERROR: A daemon on node saturn failed to start as expected.
>>>>>> [sun:25213] ERROR: There may be more information available from
>>>>>> [sun:25213] ERROR: the remote shell (see above).
>>>>>> [sun:25213] ERROR: The daemon exited unexpectedly with status 255.
>>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>>>>> base/pls_base_orted_cmds.c at line 188
>>>>>> [sun:25213] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
>>>>>> line 1196
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun was unable to cleanly terminate the daemons for this job.
>>>>>> Returned value Timeout instead of ORTE_SUCCESS.
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> The program is runable from each node alone (mpirun -np2 main)
>>>>>>
>>>>>> My PathVariables:
>>>>>> $PATH
>>>>>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/libecho
>>>>>> $LD_LIBRARY_PATH
>>>>>> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/c3-4/:/usr/lib:/usr/local/lib
>>>>>>
>>>>>> Passwordless ssh is up 'n running
>>>>>>
>>>>>> I walked through the FAQ and Mailing Lists but couldn't find any
>>>>>> solution for my problem.
>>>>>>
>>>>>> Thanks
>>>>>> Dino R.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>