Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-06-05 09:20:24


No worries!

This is actually an intended feature -- it allows specific configuration
on a per-node basis (especially for heterogeneous situations, perhaps
not as heterogeneous as different architectures, but one can easily
imagine scenarios where different resources exist within the same
cluster, such as different networks, different amounts of RAM, etc.).

You make a good point about the values in that file, though -- I'll add
some information to the FAQ that such config files are only valid on the
nodes where they can be seen (i.e., that mpirun does not bundle up all
these files and send them to remote nodes during mpirun). Sorry for the
confusion!
 

> -----Original Message-----
> From: devel-bounces_at_[hidden]
> [mailto:devel-bounces_at_[hidden]] On Behalf Of Paul Donohue
> Sent: Monday, June 05, 2006 8:50 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Oversubscription/Scheduling Bug
>
> Sorry Brian and Jeff - I sent you chasing after something of
> a red herring...
>
> After much more testing and banging my head on the desk
> trying to figure this one out, it turns out '--mca
> mpi_yield_when_idle 1' on the command line does actually work
> properly for me... The one or two times I had previously
> tried using the command line argument, my app (by unfortunate
> coincidence - it took me a long time to figure this one out)
> happened to run slowly for completely unrelated reasons.
>
> However, instead of typing the command line argument each
> time, for the bulk of my testing I was instead putting
> 'mpi_yield_when_idle = 1' in
> /usr/local/etc/openmpi-mca-params.conf on the machine I ran
> 'mpirun' from. I didn't update that file on each of my
> worker nodes - only on the node i was running 'mpirun' from.
> I had assumed that this would have the same effect as typing
> '--mca mpi_yield_when_idle 1' on the command line - mpirun
> would read /usr/local/etc/openmpi-mca-params.conf, import all
> of the parameters, then propagate those parameters to the
> worker nodes as if the parameters were typed on the command
> line. Apparently, in reality, orted reads
> /usr/local/etc/openmpi-mca-params.conf on the local node
> where orted is actually running, and entries in the file on
> the node where 'mpirun' is run are not propagated. Is this a
> bug or an undocumented feature? ;)
>
> Sorry to have wasted your time chasing the wrong problem...
> -Paul
>
> On Fri, May 26, 2006 at 01:09:22PM -0400, Brian W. Barrett wrote:
> > On Fri, 26 May 2006, Brian W. Barrett wrote:
> >
> > > On Fri, 26 May 2006, Jeff Squyres (jsquyres) wrote:
> > >
> > >> You can see this by slightly modifying your test command
> -- run "env"
> > >> instead of "hostname". You'll see that the environment variable
> > >> OMPI_MCA_mpi_yield_when_idle is set to the value that
> you passed in on
> > >> the mpirun command line, regardless of a) whether you're
> oversubscribing
> > >> or not, and b) whatever is passed in through the orted.
> > >
> > > While Jeff is correct that the parameter informing the
> MPI process that it
> > > should idle when it's not busy is correctly set, it turns
> out that we are
> > > ignoring this parameter inside the MPI process. I'm
> looking into this and
> > > hope to have a fix this afternoon.
> >
> > Mea culpa. Jeff's right that in a normal application, we
> are setting up
> > to call sched_yield() when idle if the user sets
> mpi_yield_when_idle to 1,
> > regardless of what is in the hostfile . The problem with
> my test case was
> > that for various reasons, my test code was never actually
> "idling" - there
> > were always things moving along, so our progress engine was
> deciding that
> > the process should not be idled.
> >
> > Can you share your test code at all? I'm wondering if
> something similar
> > is happening with your code. It doesn't sound like it
> should be "always
> > working", but I'm wondering if you're triggering some
> corner case we
> > haven't thought of.
> >
> > Brian
> >
> > --
> > Brian Barrett
> > Graduate Student, Open Systems Lab, Indiana University
> > http://www.osl.iu.edu/~brbarret/
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>