Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] "Failed to find the following executable" problemunder Torque
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-09-26 09:23:35


On Sep 25, 2009, at 7:55 AM, Blosch, Edwin L wrote:

> I’m having a problem running OpenMPI under Torque. It complains
> like there is a command syntax problem, but the three variations
> below are all correct, best I can tell using mpirun –help. The
> environment in which the command executes, i.e. PATH and
> LD_LIBRARY_PATH, is correct. Torque is 2.3.x. OpenMPI is 1.2.8.
> OFED is 1.4.

Is your mpirun a script, perchance? It's almost like the arguments
that end up being passed are getting munged / re-ordered, and Bad
Things happen such that the real mpirun under the covers gets confused.

> /usr/mpi/intel/openmpi-1.2.8/bin/mpirun -np 28 /tmp/43.fwnaeglingio/
> falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro /tmp/
> 43.fwnaeglingio/restart.0
> --------------------------------------------------------------------------
> Failed to find the following executable:
>
> Host: n8n26
> Executable: -p

I don't even see -p in that argument list. Where is it coming from?

A little background: OMPI's mpirun analyzes the command line tokens
that are passed to it. The first one that it doesn't recognize, it
assumes is the executable to invoke. In this case, OMPI's mpirun
found a "-p" on the command line (not sure how that happened; perhaps /
usr/mpi/intel/openmpi-1.2.8/bin/mpirun is not actually OMPI's mpirun,
as I mentioned above...?) and tried to execute it. But then there was
no executable named "-p" to be found in the filesystem, then OMPI
printed the error.

> mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile /var/
> spool/torque/aux/45.fwnaeglingio -np 28 --mca btl ^tcp --mca
> mpi_leave_pinned 1 --mca mpool_base_use_mem_hooks 1 -x
> LD_LIBRARY_PATH -x MPI_ENVIRONMENT /tmp/45.fwnaeglingio/
> falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro /tmp/
> 45.fwnaeglingio/restart.0
> --------------------------------------------------------------------------
> Failed to find or execute the following executable:
>
> Host: n8n27
> Executable: --prefix /usr/mpi/intel/openmpi-1.2.8

Ditto on this one. --prefix is a valid mpirun command line argument,
so it should not have complained.

But then again, I confess to not remembering all the 1.2.x command
line options; I don't remember if --prefix was introduced in the 1.2
or 1.3 series...

> /usr/mpi/intel/openmpi-1.2.8/bin/mpirun -x LD_LIBRARY_PATH -x
> MPI_ENVIRONMENT=1 /tmp/47.fwnaeglingio/falconv4_ibm_openmpi -cycles
> 100 -ri restart.0 -ro /tmp/47.fwnaeglingio/restart.0
> --------------------------------------------------------------------------
> Failed to find the following executable:
>
> Host: n8n27
> Executable: -

Ditto to #1.

-- 
Jeff Squyres
jsquyres_at_[hidden]