Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI and OAR issues
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-11-06 15:42:57


Thanks for the OAR explanation!

Sorry - I should have been clearer in my comment. I was trying to
indicate that the cmd starting with "set" is indicating a bash syntax
error, and that is why the launch fails.

The rsh launcher uses a little "probe" technique to try and guess the
remote shell. Apparently, it thinks this is tcsh, while the remote
node thinks it will use bash.

Are you running this from bash? If so, you could perhaps resolve the
problem by specifying -mca pls_rsh_assume_same_shell 1 on your command
line. This will override the probe and force the system to use the
syntax appropriate to the same shell you used for mpirun.

Alternatively, you could set -mca pls_rsh_debug 1 to see all the debug
output as the system probes your remote shell. Might help you figure
out why it thinks it is tcsh.

Ralph

On Nov 6, 2008, at 1:31 PM, George Bosilca wrote:

> OAR is the batch scheduler used on the Grid5K platform. As far as I
> know, set is a basic shell internal command, and it is understood by
> all shells. The problem here seems to be that somehow we're using
> bash, but with a tcsh shell code (because setenv is definitively not
> something that bash understand).
>
> george.
>
> On Nov 6, 2008, at 3:07 PM, Ralph Castain wrote:
>
>> I have no idea what "oar" is, but it looks to me like the rsh
>> launcher is getting confused about the remote shell it will use - I
>> don't believe that the "set" cmd shown below is proper bash syntax,
>> and that is the error that is causing the launch to fail.
>>
>> What remote shell should it fine? I know we don't have any "oar"
>> shell-specific code in the system, but maybe it looks like
>> something else?
>>
>> On Nov 6, 2008, at 12:55 PM, Andrea Pellegrini wrote:
>>
>>> Hi all,
>>> I'm trying to run an openmpi application on a oar cluster. I think
>>> the cluster is configured correctly but I still have problems when
>>> I run mpirun:
>>>
>>> apellegr_at_m45-037:~$ mpirun -prefix /n/poolfs/z/home/apellegr/
>>> openmpi -machinefile $OAR_FILE_NODES -mca pls_rsh_agent "oarsh" -
>>> np 10 /n/poolfs/z/home/apellegr/mpi_test/hello_world.x86 bash: -c:
>>> line 0: syntax error near unexpected token `('
>>> bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
>>> bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
>>> ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
>>> $LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
>>> bootproxy 1 --name 0.0.4 --num_procs 5 --vpid_start 0 --nodename
>>> m45-040.pool --universe apellegr_at_m45-037.pool:default-
>>> universe-29482 --nsreplica "0.0.0;tcp://10.11.45.37:36790" --
>>> gprreplica "0.0.0;tcp://10.11.45.37:36790"'
>>> bash: -c: line 0: syntax error near unexpected token `('
>>> bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
>>> bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
>>> ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
>>> $LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
>>> bootproxy 1 --name 0.0.2 --num_procs 5 --vpid_start 0 --nodename
>>> m45-038.pool --universe apellegr_at_m45-037.pool:default-
>>> universe-29482 --nsreplica "0.0.0;tcp://10.11.45.37:36790" --
>>> gprreplica "0.0.0;tcp://10.11.45.37:36790"'
>>> [m45-037.pool:29482] ERROR: A daemon on node m45-038.pool failed
>>> to start as expected.
>>> [m45-037.pool:29482] ERROR: There may be more information
>>> available from
>>> [m45-037.pool:29482] ERROR: the remote shell (see above).
>>> [m45-037.pool:29482] ERROR: The daemon exited unexpectedly with
>>> status 2.
>>> bash: -c: line 0: syntax error near unexpected token `('
>>> bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
>>> bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
>>> ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
>>> $LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
>>> bootproxy 1 --name 0.0.3 --num_procs 5 --vpid_start 0 --nodename
>>> m45-039.pool --universe apellegr_at_m45-037.pool:default-
>>> universe-29482 --nsreplica "0.0.0;tcp://10.11.45.37:36790" --
>>> gprreplica "0.0.0;tcp://10.11.45.37:36790"'
>>> [m45-037.pool:29482] ERROR: A daemon on node m45-039.pool failed
>>> to start as expected.
>>> [m45-037.pool:29482] ERROR: There may be more information
>>> available from
>>> [m45-037.pool:29482] ERROR: the remote shell (see above).
>>> [m45-037.pool:29482] ERROR: The daemon exited unexpectedly with
>>> status 2.
>>> [m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file ../../../../orte/mca/pls/base/pls_base_orted_cmds.c at line 275
>>> [m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file ../../../../../orte/mca/pls/rsh/pls_rsh_module.c at line 1158
>>> [m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp.c at line 90
>>> [m45-037.pool:29482] ERROR: A daemon on node m45-040.pool failed
>>> to start as expected.
>>> [m45-037.pool:29482] ERROR: There may be more information
>>> available from
>>> [m45-037.pool:29482] ERROR: the remote shell (see above).
>>> [m45-037.pool:29482] ERROR: The daemon exited unexpectedly with
>>> status 2.
>>> [m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file ../../../../orte/mca/pls/base/pls_base_orted_cmds.c at line 188
>>> [m45-037.pool:29482] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file ../../../../../orte/mca/pls/rsh/pls_rsh_module.c at line 1190
>>> --------------------------------------------------------------------------
>>> mpirun was unable to cleanly terminate the daemons for this job.
>>> Returned value Timeout instead of ORTE_SUCCESS.
>>> --------------------------------------------------------------------------
>>> apellegr_at_m45-037:~$
>>>
>>>
>>> If I run it with the option "-mca pls_rsh_debug 1" I get:
>>>
>>> apellegr_at_m45-037:~$ mpirun -prefix /n/poolfs/z/home/apellegr/
>>> openmpi -machinefile $OAR_FILE_NODES -mca pls_rsh_debug 1 -mca
>>> pls_rsh_agent "oarsh" -np 10 /n/poolfs/z/home/apellegr/mpi_test/
>>> hello_world.x86
>>> [m45-037.pool:29473] pls:rsh: local shell: 2 (tcsh)
>>> [m45-037.pool:29473] pls:rsh: assuming same remote shell as local
>>> shell
>>> [m45-037.pool:29473] pls:rsh: remote shell: 2 (tcsh)
>>> [m45-037.pool:29473] pls:rsh: final template argv:
>>> [m45-037.pool:29473] pls:rsh: /usr/bin/oarsh <template> orted
>>> --bootproxy 1 --name <template> --num_procs 5 --vpid_start 0 --
>>> nodename <template> --universe apellegr_at_m45-037.pool:default-
>>> universe-29473 --nsreplica "0.0.0;tcp://10.11.45.37:55477" --
>>> gprreplica "0.0.0;tcp://10.11.45.37:55477"
>>> [m45-037.pool:29473] pls:rsh: launching on node m45-037.pool
>>> [m45-037.pool:29473] pls:rsh: m45-037.pool is a LOCAL node
>>> [m45-037.pool:29473] pls:rsh: reset PATH: /n/poolfs/z/home/
>>> apellegr/openmpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
>>> bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/z/home/apellegr/openmpi/
>>> bin:/n/poolfs/z/home/apellegr/openssl/bin
>>> [m45-037.pool:29473] pls:rsh: reset LD_LIBRARY_PATH: /n/poolfs/z/
>>> home/apellegr/openmpi/lib
>>> [m45-037.pool:29473] pls:rsh: changing to directory /home/apellegr
>>> [m45-037.pool:29473] pls:rsh: executing: (/n/poolfs/z/home/
>>> apellegr/openmpi/bin/orted) orted --bootproxy 1 --name 0.0.1 --
>>> num_procs 5 --vpid_start 0 --nodename m45-037.pool --universe apellegr_at_m45-037.pool
>>> :default-universe-29473 --nsreplica "0.0.0;tcp://
>>> 10.11.45.37:55477" --gprreplica "0.0.0;tcp://10.11.45.37:55477" --
>>> set-sid [OAR_JOBID=597856 HOST=m45-037.pool TERM=xterm SHELL=/bin/
>>> tcsh OAR_WORKING_DIRECTORY=/home/apellegr SSH_CLIENT=10.11.0.4
>>> 50481 6667 OAR_USER=apellegr GROUP=csestudents USER=apellegr
>>> SUDO_USER=oar OAR_WORKDIR=/home/apellegr SUDO_UID=30143
>>> HOSTTYPE=i486-linux USERNAME=apellegr OAR_JOB_NAME= OAR_NODE_FILE=/
>>> var/lib/oar/597856 OAR_RESOURCE_PROPERTIES_FILE=/var/lib/oar/
>>> 597856_resources MAIL=/var/mail/oar PATH=/n/poolfs/z/home/apellegr/
>>> openmpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
>>> sbin:/bin:/usr/X11R6/bin:/n/poolfs/z/home/apellegr/openmpi/bin:/n/
>>> poolfs/z/home/apellegr/openssl/bin OAR_PROJECT_NAME=default
>>> OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/apellegr HOME=/home/
>>> apellegr SUDO_COMMAND=OAR SHLVL=2 OAR_FILE_NODES=/var/lib/oar/
>>> 597856 OSTYPE=linux VENDOR=intel OAR_JOB_WALLTIME=2:0:0
>>> MACHTYPE=i486 LOGNAME=apellegr OAR_NODEFILE=/var/lib/oar/597856
>>> OAR_RESOURCE_FILE=/var/lib/oar/597856 SUDO_GID=390
>>> OAR_JOB_ID=597856 OAR_O_WORKDIR=/home/apellegr _=/n/poolfs/z/home/
>>> apellegr/openmpi/bin/mpirun OLDPWD=/home/apellegr/openmpi
>>> OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
>>> OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh
>>> LD_LIBRARY_PATH=/n/poolfs/z/home/apellegr/openmpi/lib
>>> OMPI_MCA_seed=0]
>>> [m45-037.pool:29473] pls:rsh: launching on node m45-038.pool
>>> [m45-037.pool:29473] pls:rsh: m45-038.pool is a REMOTE node
>>> [m45-037.pool:29473] pls:rsh: executing: (//usr/bin/oarsh) /usr/
>>> bin/oarsh m45-038.pool set path = ( /n/poolfs/z/home/apellegr/
>>> openmpi/bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set
>>> OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib ; if ( $?
>>> OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib:$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/
>>> openmpi/bin/orted --bootproxy 1 --name 0.0.2 --num_procs 5 --
>>> vpid_start 0 --nodename m45-038.pool --universe apellegr_at_m45-037.pool
>>> :default-universe-29473 --nsreplica "0.0.0;tcp://
>>> 10.11.45.37:55477" --gprreplica "0.0.0;tcp://
>>> 10.11.45.37:55477" [OAR_JOBID=597856 HOST=m45-037.pool TERM=xterm
>>> SHELL=/bin/tcsh OAR_WORKING_DIRECTORY=/home/apellegr
>>> SSH_CLIENT=10.11.0.4 50481 6667 OAR_USER=apellegr
>>> GROUP=csestudents USER=apellegr SUDO_USER=oar OAR_WORKDIR=/home/
>>> apellegr SUDO_UID=30143 HOSTTYPE=i486-linux USERNAME=apellegr
>>> OAR_JOB_NAME= OAR_NODE_FILE=/var/lib/oar/597856
>>> OAR_RESOURCE_PROPERTIES_FILE=/var/lib/oar/597856_resources MAIL=/
>>> var/mail/oar PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
>>> bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/z/home/apellegr/openmpi/
>>> bin:/n/poolfs/z/home/apellegr/openssl/bin OAR_PROJECT_NAME=default
>>> OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/apellegr HOME=/home/
>>> apellegr SUDO_COMMAND=OAR SHLVL=2 OAR_FILE_NODES=/var/lib/oar/
>>> 597856 OSTYPE=linux VENDOR=intel OAR_JOB_WALLTIME=2:0:0
>>> MACHTYPE=i486 LOGNAME=apellegr OAR_NODEFILE=/var/lib/oar/597856
>>> OAR_RESOURCE_FILE=/var/lib/oar/597856 SUDO_GID=390
>>> OAR_JOB_ID=597856 OAR_O_WORKDIR=/home/apellegr _=/n/poolfs/z/home/
>>> apellegr/openmpi/bin/mpirun OLDPWD=/home/apellegr/openmpi
>>> OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
>>> OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh
>>> OMPI_MCA_seed=0]
>>> bash: -c: line 0: syntax error near unexpected token `('
>>> bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
>>> bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
>>> ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
>>> $LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
>>> bootproxy 1 --name 0.0.2 --num_procs 5 --vpid_start 0 --nodename
>>> m45-038.pool --universe apellegr_at_m45-037.pool:default-
>>> universe-29473 --nsreplica "0.0.0;tcp://10.11.45.37:55477" --
>>> gprreplica "0.0.0;tcp://10.11.45.37:55477"'
>>> [m45-037.pool:29473] pls:rsh: launching on node m45-039.pool
>>> [m45-037.pool:29473] ERROR: A daemon on node m45-038.pool failed
>>> to start as expected.
>>> [m45-037.pool:29473] ERROR: There may be more information
>>> available from
>>> [m45-037.pool:29473] ERROR: the remote shell (see above).
>>> [m45-037.pool:29473] ERROR: The daemon exited unexpectedly with
>>> status 2.
>>> [m45-037.pool:29473] pls:rsh: m45-039.pool is a REMOTE node
>>> [m45-037.pool:29473] pls:rsh: executing: (//usr/bin/oarsh) /usr/
>>> bin/oarsh m45-039.pool set path = ( /n/poolfs/z/home/apellegr/
>>> openmpi/bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set
>>> OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib ; if ( $?
>>> OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib:$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/
>>> openmpi/bin/orted --bootproxy 1 --name 0.0.3 --num_procs 5 --
>>> vpid_start 0 --nodename m45-039.pool --universe apellegr_at_m45-037.pool
>>> :default-universe-29473 --nsreplica "0.0.0;tcp://
>>> 10.11.45.37:55477" --gprreplica "0.0.0;tcp://
>>> 10.11.45.37:55477" [OAR_JOBID=597856 HOST=m45-037.pool TERM=xterm
>>> SHELL=/bin/tcsh OAR_WORKING_DIRECTORY=/home/apellegr
>>> SSH_CLIENT=10.11.0.4 50481 6667 OAR_USER=apellegr
>>> GROUP=csestudents USER=apellegr SUDO_USER=oar OAR_WORKDIR=/home/
>>> apellegr SUDO_UID=30143 HOSTTYPE=i486-linux USERNAME=apellegr
>>> OAR_JOB_NAME= OAR_NODE_FILE=/var/lib/oar/597856
>>> OAR_RESOURCE_PROPERTIES_FILE=/var/lib/oar/597856_resources MAIL=/
>>> var/mail/oar PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
>>> bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/z/home/apellegr/openmpi/
>>> bin:/n/poolfs/z/home/apellegr/openssl/bin OAR_PROJECT_NAME=default
>>> OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/apellegr HOME=/home/
>>> apellegr SUDO_COMMAND=OAR SHLVL=2 OAR_FILE_NODES=/var/lib/oar/
>>> 597856 OSTYPE=linux VENDOR=intel OAR_JOB_WALLTIME=2:0:0
>>> MACHTYPE=i486 LOGNAME=apellegr OAR_NODEFILE=/var/lib/oar/597856
>>> OAR_RESOURCE_FILE=/var/lib/oar/597856 SUDO_GID=390
>>> OAR_JOB_ID=597856 OAR_O_WORKDIR=/home/apellegr _=/n/poolfs/z/home/
>>> apellegr/openmpi/bin/mpirun OLDPWD=/home/apellegr/openmpi
>>> OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
>>> OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh
>>> OMPI_MCA_seed=0]
>>> bash: -c: line 0: syntax error near unexpected token `('
>>> bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
>>> bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
>>> ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
>>> $LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
>>> bootproxy 1 --name 0.0.3 --num_procs 5 --vpid_start 0 --nodename
>>> m45-039.pool --universe apellegr_at_m45-037.pool:default-
>>> universe-29473 --nsreplica "0.0.0;tcp://10.11.45.37:55477" --
>>> gprreplica "0.0.0;tcp://10.11.45.37:55477"'
>>> [m45-037.pool:29473] pls:rsh: launching on node m45-040.pool
>>> [m45-037.pool:29473] ERROR: A daemon on node m45-039.pool failed
>>> to start as expected.
>>> [m45-037.pool:29473] ERROR: There may be more information
>>> available from
>>> [m45-037.pool:29473] ERROR: the remote shell (see above).
>>> [m45-037.pool:29473] ERROR: The daemon exited unexpectedly with
>>> status 2.
>>> [m45-037.pool:29473] pls:rsh: m45-040.pool is a REMOTE node
>>> [m45-037.pool:29473] pls:rsh: executing: (//usr/bin/oarsh) /usr/
>>> bin/oarsh m45-040.pool set path = ( /n/poolfs/z/home/apellegr/
>>> openmpi/bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set
>>> OMPI_have_llp ; if ( $?LD_LIBRARY_PATH == 0 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib ; if ( $?
>>> OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib:$LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/
>>> openmpi/bin/orted --bootproxy 1 --name 0.0.4 --num_procs 5 --
>>> vpid_start 0 --nodename m45-040.pool --universe apellegr_at_m45-037.pool
>>> :default-universe-29473 --nsreplica "0.0.0;tcp://
>>> 10.11.45.37:55477" --gprreplica "0.0.0;tcp://
>>> 10.11.45.37:55477" [OAR_JOBID=597856 HOST=m45-037.pool TERM=xterm
>>> SHELL=/bin/tcsh OAR_WORKING_DIRECTORY=/home/apellegr
>>> SSH_CLIENT=10.11.0.4 50481 6667 OAR_USER=apellegr
>>> GROUP=csestudents USER=apellegr SUDO_USER=oar OAR_WORKDIR=/home/
>>> apellegr SUDO_UID=30143 HOSTTYPE=i486-linux USERNAME=apellegr
>>> OAR_JOB_NAME= OAR_NODE_FILE=/var/lib/oar/597856
>>> OAR_RESOURCE_PROPERTIES_FILE=/var/lib/oar/597856_resources MAIL=/
>>> var/mail/oar PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
>>> bin:/sbin:/bin:/usr/X11R6/bin:/n/poolfs/z/home/apellegr/openmpi/
>>> bin:/n/poolfs/z/home/apellegr/openssl/bin OAR_PROJECT_NAME=default
>>> OAR_JOB_WALLTIME_SECONDS=7200 PWD=/home/apellegr HOME=/home/
>>> apellegr SUDO_COMMAND=OAR SHLVL=2 OAR_FILE_NODES=/var/lib/oar/
>>> 597856 OSTYPE=linux VENDOR=intel OAR_JOB_WALLTIME=2:0:0
>>> MACHTYPE=i486 LOGNAME=apellegr OAR_NODEFILE=/var/lib/oar/597856
>>> OAR_RESOURCE_FILE=/var/lib/oar/597856 SUDO_GID=390
>>> OAR_JOB_ID=597856 OAR_O_WORKDIR=/home/apellegr _=/n/poolfs/z/home/
>>> apellegr/openmpi/bin/mpirun OLDPWD=/home/apellegr/openmpi
>>> OMPI_MCA_rds_hostfile_path=/var/lib/oar/597856
>>> OMPI_MCA_pls_rsh_debug=1 OMPI_MCA_pls_rsh_agent=oarsh
>>> OMPI_MCA_seed=0]
>>> bash: -c: line 0: syntax error near unexpected token `('
>>> bash: -c: line 0: ` set path = ( /n/poolfs/z/home/apellegr/openmpi/
>>> bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if
>>> ( $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH /n/poolfs/z/home/
>>> apellegr/openmpi/lib ; if ( $?OMPI_have_llp == 1 ) setenv
>>> LD_LIBRARY_PATH /n/poolfs/z/home/apellegr/openmpi/lib:
>>> $LD_LIBRARY_PATH ; /n/poolfs/z/home/apellegr/openmpi/bin/orted --
>>> bootproxy 1 --name 0.0.4 --num_procs 5 --vpid_start 0 --nodename
>>> m45-040.pool --universe apellegr_at_m45-037.pool:default-
>>> universe-29473 --nsreplica "0.0.0;tcp://10.11.45.37:55477" --
>>> gprreplica "0.0.0;tcp://10.11.45.37:55477"'
>>> [m45-037.pool:29473] ERROR: A daemon on node m45-040.pool failed
>>> to start as expected.
>>> [m45-037.pool:29473] ERROR: There may be more information
>>> available from
>>> [m45-037.pool:29473] ERROR: the remote shell (see above).
>>> [m45-037.pool:29473] ERROR: The daemon exited unexpectedly with
>>> status 2.
>>> [m45-037.pool:29473] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file ../../../../orte/mca/pls/base/pls_base_orted_cmds.c at line 188
>>> [m45-037.pool:29473] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file ../../../../../orte/mca/pls/rsh/pls_rsh_module.c at line 1190
>>> --------------------------------------------------------------------------
>>> mpirun was unable to cleanly terminate the daemons for this job.
>>> Returned value Timeout instead of ORTE_SUCCESS.
>>> --------------------------------------------------------------------------
>>> apellegr_at_m45-037:~$
>>>
>>> Can anybody help me?
>>> Thanks,
>>> ~Andrea
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users