Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Barry Evans (bevans_at_[hidden])
Date: 2007-04-01 09:16:09


Hello,

 

Having a bit of trouble running Open MPI 1.2 under Torque 2.1.8.

 

My Script contains the following:

-----------------------------------------------

HPCC_HOME=/home/test/hpcc-1.0.0

ncpus=`wc -l $PBS_NODEFILE`

mpirun -np $ncpus $HPCC_HOME/hpcc

-----------------------------------------------

 

 

When I try to run on 4 nodes, 4 cpus each I receive the following in my
err file:

 

[node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188

[node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188

[node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188

------------------------------------------------------------------------

--
Failed to find or execute the following executable:
 
Host:       node007
Executable: /var/spool/torque/aux//350.wc01
 
Cannot continue.
------------------------------------------------------------------------
--
[no---------------------------------------------------------------------
-----
Failed to find or execute the following executable:
 
Host:       node004
Executable: /var/spool/torque/aux//350.wc01
 
Cannot continue.
------------------------------------------------------------------------
--
de004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file
odls_default_module.c at line 1188
------------------------------------------------------------------------
--
Failed to find or execute the following executable:
 
Host:       node003
Executable: /var/spool/torque/aux//350.wc01
 
Cannot continue.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
Failed to find or execute the following executable:
 
Host:       node008
Executable: /var/spool/torque/aux//350.wc01
 
Cannot continue.
------------------------------------------------------------------------
--
[node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file orted.c at
line 588
[node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file orted.c at
line 588
[node004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file orted.c at
line 588
[node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file orted.c at
line 588
 
 
Has anyone seen this before? It seems odd that openmpi would be trying
to execute what is effectively the host file. I stuck a sleep in to make
sure the file was being distributed, and sure enough, it was there. I am
able to run mvapich through torque without issue and openmpi from the
command line. 
 
Cheers,
Barry Evans
Technical Manager
OCF plc
+44 (0)7970 148 121
bevans_at_[hidden]