Hello,
Having a bit of trouble running Open MPI 1.2 under Torque
2.1.8.
My Script contains the following:
-----------------------------------------------
HPCC_HOME=/home/test/hpcc-1.0.0
ncpus=`wc -l $PBS_NODEFILE`
mpirun -np $ncpus $HPCC_HOME/hpcc
-----------------------------------------------
When I try to run on 4 nodes, 4 cpus each I receive the
following in my err file:
[node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188
[node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188
[node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188
--------------------------------------------------------------------------
Failed to find or execute the following executable:
Host: node007
Executable: /var/spool/torque/aux//350.wc01
Cannot continue.
--------------------------------------------------------------------------
[no--------------------------------------------------------------------------
Failed to find or execute the following executable:
Host: node004
Executable: /var/spool/torque/aux//350.wc01
Cannot continue.
--------------------------------------------------------------------------
de004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188
--------------------------------------------------------------------------
Failed to find or execute the following executable:
Host: node003
Executable: /var/spool/torque/aux//350.wc01
Cannot continue.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Failed to find or execute the following executable:
Host: node008
Executable: /var/spool/torque/aux//350.wc01
Cannot continue.
--------------------------------------------------------------------------
[node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file
orted.c at line 588
[node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file
orted.c at line 588
[node004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file
orted.c at line 588
[node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file
orted.c at line 588
Has anyone seen this before? It seems odd that openmpi would
be trying to execute what is effectively the host file. I stuck a sleep in to
make sure the file was being distributed, and sure enough, it was there. I am
able to run mvapich through torque without issue and openmpi from the command
line.
Cheers,
Barry Evans
Technical Manager
OCF plc
+44 (0)7970 148 121
bevans@ocf.co.uk