Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] "Failed to find the following executable" problemunder Torque
From: Blosch, Edwin L (edwin.l.blosch_at_[hidden])
Date: 2009-09-28 12:17:41


Thanks for the reply. I looked harder at the command invocation and I think I stumbled across an answer. My actual mpirun command is invoked from a Python script using the subprocess module. When you create a subprocess, one of the options is "shell" and I had that set to False, causing the actual invocation to use spawn or exec (one of the variants) instead of system().

When I pass down the argument list as follows, mpirun fails with "cannot find executable named '--prefix /usr/mpi/intel/openmpi-1.2.8' "

  Command: ['mpirun', '--prefix /usr/mpi/intel/openmpi-1.2.8', '-np 8', '--mca btl ^tcp', ' --mca mpi_leave_pinned 1', '--mca mpool_base_use_mem_hooks 1', '-x LD_LIBRARY_PATH', '-x MPI_ENVIRONMENT=1', '/tmp/7852.fwnaeglingio/falconv4_ibm_openmpi', '-cycles', '10', '-ri', 'restart.5000', '-ro', '/tmp/7852.fwnaeglingio/restart.5000']

whereas if I take the additional step of removing spaces from the arguments, it works:

  Command: ['mpirun', '--prefix', '/usr/mpi/intel/openmpi-1.2.8', '--machinefile', '/var/spool/torque/aux/7854.fwnaeglingio', '-np', '8', '--mca', 'btl', '^tcp', '--mca', 'mpi_leave_pinned', '1', '--mca', 'mpool_base_use_mem_hooks', '1', '-x', 'LD_LIBRARY_PATH', '-x', 'MPI_ENVIRONMENT=1', '/tmp/7854.fwnaeglingio/falconv4_ibm_openmpi', '-cycles', '10', '-ri', 'restart.5010', '-ro', '/tmp/7854.fwnaeglingio/restart.5010']

Somehow the handling of the argv list by orterun has changed in 1.2.8 as compared to 1.2.2-1, as the spawned command used to execute just fine.

I'm guessing the elements in argv used to be split on spaces first, before being parsed, whereas now they are not, resulting in the first string being reported as an unrecognized option.

> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Jeff Squyres
> Sent: Saturday, September 26, 2009 8:24 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] "Failed to find the following executable"
> problemunder Torque
>
> On Sep 25, 2009, at 7:55 AM, Blosch, Edwin L wrote:
>
> > I'm having a problem running OpenMPI under Torque. It complains
> > like there is a command syntax problem, but the three variations
> > below are all correct, best I can tell using mpirun -help. The
> > environment in which the command executes, i.e. PATH and
> > LD_LIBRARY_PATH, is correct. Torque is 2.3.x. OpenMPI is 1.2.8.
> > OFED is 1.4.
>
> Is your mpirun a script, perchance? It's almost like the arguments
> that end up being passed are getting munged / re-ordered, and Bad
> Things happen such that the real mpirun under the covers gets confused.
>
> > /usr/mpi/intel/openmpi-1.2.8/bin/mpirun -np 28 /tmp/43.fwnaeglingio/
> > falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro /tmp/
> > 43.fwnaeglingio/restart.0
> > ------------------------------------------------------------------------
> --
> > Failed to find the following executable:
> >
> > Host: n8n26
> > Executable: -p
>
> I don't even see -p in that argument list. Where is it coming from?
>
> A little background: OMPI's mpirun analyzes the command line tokens
> that are passed to it. The first one that it doesn't recognize, it
> assumes is the executable to invoke. In this case, OMPI's mpirun
> found a "-p" on the command line (not sure how that happened; perhaps /
> usr/mpi/intel/openmpi-1.2.8/bin/mpirun is not actually OMPI's mpirun,
> as I mentioned above...?) and tried to execute it. But then there was
> no executable named "-p" to be found in the filesystem, then OMPI
> printed the error.
>
> > mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile /var/
> > spool/torque/aux/45.fwnaeglingio -np 28 --mca btl ^tcp --mca
> > mpi_leave_pinned 1 --mca mpool_base_use_mem_hooks 1 -x
> > LD_LIBRARY_PATH -x MPI_ENVIRONMENT /tmp/45.fwnaeglingio/
> > falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro /tmp/
> > 45.fwnaeglingio/restart.0
> > ------------------------------------------------------------------------
> --
> > Failed to find or execute the following executable:
> >
> > Host: n8n27
> > Executable: --prefix /usr/mpi/intel/openmpi-1.2.8
>
> Ditto on this one. --prefix is a valid mpirun command line argument,
> so it should not have complained.
>
> But then again, I confess to not remembering all the 1.2.x command
> line options; I don't remember if --prefix was introduced in the 1.2
> or 1.3 series...
>
> > /usr/mpi/intel/openmpi-1.2.8/bin/mpirun -x LD_LIBRARY_PATH -x
> > MPI_ENVIRONMENT=1 /tmp/47.fwnaeglingio/falconv4_ibm_openmpi -cycles
> > 100 -ri restart.0 -ro /tmp/47.fwnaeglingio/restart.0
> > ------------------------------------------------------------------------
> --
> > Failed to find the following executable:
> >
> > Host: n8n27
> > Executable: -
>
>
> Ditto to #1.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users