Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem calling mpirun from script invoked with mpirun
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-10-28 16:30:49


Normally, one does simply set the ld_library_path in your environment to
point to the right thing. Alternatively, you could configure OMPI with

--enable-mpirun-prefix-by-default

This tells OMPI to automatically add the prefix you configured the system
with to your ld_library_path and path envars. It should solve your problem,
if you don't want to simply set those values in your environment anyway.

Ralph

On Wed, Oct 28, 2009 at 2:10 PM, Luke Shulenburger
<lshulenburger_at_[hidden]>wrote:

> Thanks for the quick reply. This leads me to another issue I have
> been having with openmpi as it relates to sge. The "tight
> integration" works where I do not have to give mpirun a hostfile when
> I use the scheduler, but it does not seem to be passing on my
> environment variables. Specifically because I used intel compilers to
> compile openmpi, I have to be sure to set the LD_LIBRARY_PATH
> correctly in my job submission script or openmpi will not run (giving
> the error discussed in the FAQ). Where I am a little lost is whether
> this is a problem with the way I built openmpi or whether it is a
> configuration problem with sge.
>
> This may be unrelated to my previous problem, but the similarities
> with the environment variables made me think of it.
>
> Thanks for your consideration,
> Luke Shulenburger
> Geophysical Laboratory
> Carnegie Institution of Washington
>
> On Wed, Oct 28, 2009 at 3:48 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> > I'm afraid we have never really supported this kind of nested invocations
> of
> > mpirun. If it works with any version of OMPI, it is totally a fluke - it
> > might work one time, and then fail the next.
> >
> > The problem is that we pass envars to the launched processes to control
> > their behavior, and these conflict with what mpirun needs. We have tried
> > various scrubbing mechanisms (i.e., having mpirun start out by scrubbing
> the
> > environment of envars that would have come from the initial mpirun, but
> they
> > all have the unfortunate possibility of removing parameters provided by
> the
> > user - and that can cause its own problems.
> >
> > I don't know if we will ever support nested operations - occasionally, I
> do
> > give it some thought, but have yet to find a foolproof solution.
> >
> > Ralph
> >
> >
> > On Wed, Oct 28, 2009 at 1:11 PM, Luke Shulenburger <
> lshulenburger_at_[hidden]>
> > wrote:
> >>
> >> Hello,
> >> I am having trouble with a script that calls mpi. Basically my
> >> problem distills to wanting to call a script with:
> >>
> >> mpirun -np # ./script.sh
> >>
> >> where script.sh looks like:
> >> #!/bin/bash
> >> mpirun -np 2 ./mpiprogram
> >>
> >> Whenever I invoke script.sh normally (as ./script.sh for instance) it
> >> works fine, but if I do mpirun -np 2 ./script.sh I get the following
> >> error:
> >>
> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent to a process whose contact information is
> >> unknown in file rml_oob_send.c at line 105
> >> [ppv.stanford.edu:08814] [[27860,1],0] could not get route to
> >> [[INVALID],INVALID]
> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent to a process whose contact information is
> >> unknown in file base/plm_base_proxy.c at line 86
> >>
> >> I have also tried running with mpirun -d to get some debugging info
> >> and it appears that the proctable is not being created for the second
> >> mpirun. The command hangs like so:
> >>
> >> [ppv.stanford.edu:08823] procdir:
> >> /tmp/openmpi-sessions-sluke_at_[hidden]_0/27855/0/0
> >> [ppv.stanford.edu:08823] jobdir:
> >> /tmp/openmpi-sessions-sluke_at_[hidden]_0/27855/0
> >> [ppv.stanford.edu:08823] top: openmpi-sessions-sluke_at_[hidden]_0
> >> [ppv.stanford.edu:08823] tmp: /tmp
> >> [ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch
> >> ffc91200
> >> [ppv.stanford.edu:08823] Info: Setting up debugger process table for
> >> applications
> >> MPIR_being_debugged = 0
> >> MPIR_debug_state = 1
> >> MPIR_partial_attach_ok = 1
> >> MPIR_i_am_starter = 0
> >> MPIR_proctable_size = 1
> >> MPIR_proctable:
> >> (i, host, exe, pid) = (0, ppv.stanford.edu,
> >> /home/sluke/maintenance/openmpi-1.3.3/examples/./shell.sh, 8824)
> >> [ppv.stanford.edu:08825] procdir:
> >> /tmp/openmpi-sessions-sluke_at_[hidden]_0/27855/1/0
> >> [ppv.stanford.edu:08825] jobdir:
> >> /tmp/openmpi-sessions-sluke_at_[hidden]_0/27855/1
> >> [ppv.stanford.edu:08825] top: openmpi-sessions-sluke_at_[hidden]_0
> >> [ppv.stanford.edu:08825] tmp: /tmp
> >> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent to a process whose contact information is
> >> unknown in file rml_oob_send.c at line 105
> >> [ppv.stanford.edu:08825] [[27855,1],0] could not get route to
> >> [[INVALID],INVALID]
> >> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent to a process whose contact information is
> >> unknown in file base/plm_base_proxy.c at line 86
> >> [ppv.stanford.edu:08825] Info: Setting up debugger process table for
> >> applications
> >> MPIR_being_debugged = 0
> >> MPIR_debug_state = 1
> >> MPIR_partial_attach_ok = 1
> >> MPIR_i_am_starter = 0
> >> MPIR_proctable_size = 0
> >> MPIR_proctable:
> >>
> >>
> >> In this case, it does not matter what the ultimate mpiprogram I try to
> >> run is, the shell script fails in the same way regardless (I've tried
> >> the hello_f90 executable from the openmpi examples directory). Here
> >> are some details of my setup:
> >>
> >> I have built openmpi 1.3.3 with the intel fortran in c compilers
> >> (version 11.1). The machine uses rocks with the SGE scheduler, so I
> >> have run autoconf with ./configure --prefix=/home/sluke --with-sge,
> >> however this problem persists even if I am running on the head node
> >> outside of the scheduler. I am attaching the resulting config.log to
> >> this email as well as output to ompi_info --all and ifconfig. I hope
> >> this gives the experts on the list enough to go from, but I will be
> >> happy to provide any more information that might be helpful.
> >>
> >> Luke Shulenburger
> >> Geophysical Laboratory
> >> Carnegie Institution of Washington
> >>
> >>
> >> PS I have tried this on a machine with openmpi-1.2.6 and cannot
> >> reproduce the error, however on a second machine with openmpi-1.3.2 I
> >> have the same problem.
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>