Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: David Erukhimovich (daviderukh_at_[hidden])
Date: 2007-10-29 19:24:48


Hi,
I was just reviewing my files in order to sent them to Jeff, And fixed the
problem!!
I should've written:
        mca_base_param_string_name("rds_hostfile", "path" . . . );
instead if:
        mca_base_param_string("rds_hostfile", "path" . . .);
in the component file, 'open' function.

But I don't understand how it compiled? The is no function
mca_base_param_string that takes string as first param (I know it doesn't
comple in the module file)
I compile using 'make all install' in the openmpi dir

Thanks
--David

On Mon, 29 Oct 2007, Jeff Squyres wrote:

> Sorry guys, I did miss this earlier.
>
> I don't see a patch anywhere in the e-mail thread below -- can
> someone send me the problematic code in question?
>
> FWIW: The MCA param space is global, so there's no reason that a new/
> different RDS shouldn't be able to read the hostfile MCA parameter.
>
>
>
> On Oct 28, 2007, at 2:09 PM, Ralph Castain wrote:
>
> > Yo Jeff
> >
> > This may have slipped through your inbox (had OMPI devel in
> > subject, so may
> > have been caught in some filter) - could you please provide any
> > thoughts on
> > why the hostfile isn't getting picked up correctly? As I indicated
> > on the
> > prior note, I verified that it is working for the default hostfile
> > component
> > - I can't see anything wrong in David's call to cause the problem.
> > Please
> > refer to the prior note for that code.
> >
> > Thanks
> > Ralph
> >
> >
> >
> > On 10/28/07 10:31 AM, "David Erukhimovich"
> > <daviderukh_at_[hidden]> wrote:
> >
> >> Thank you very much for the patch, it helped me a lot (It works!) and
> >> I'm really appreciate this.
> >>
> >> p.s. Any idea about the rds thing?
> >>
> >> Regards
> >> --David
> >>
> >>
> >> Ralph H Castain wrote:
> >>> Hi David
> >>>
> >>> Here is the promised patch - it passes params just fine, but I
> >>> cannot vouch
> >>> for any unintended consequences. I -think- it will be fine, but
> >>> it lacks all
> >>> the usual testing for a patch to an official release.
> >>>
> >>> Hope it helps
> >>> Ralph
> >>>
> >>>
> >>>
> >>> On 10/20/07 10:10 AM, "David Erukhimovich"
> >>> <daviderukh_at_[hidden]> wrote:
> >>>
> >>>>
> >>>> Hi Ralph,
> >>>>
> >>>> 2. I do want the user to be able to switch between my way of
> >>>> process
> >>>> launching, and the default way. I can do it using an mca flag,
> >>>> but I would
> >>>> prefer a new component. If I is not too defficult for you,
> >>>> please make the
> >>>> patch, if it is, I'll just use an mca flag.
> >>>>
> >>>> 1. Just remmembered another difficulty I had: I've created a new
> >>>> rds
> >>>> component identical to the hostfile one. lets call it mosix.
> >>>> Now, orterun
> >>>> is saving the hostfile path in the mca parameter -
> >>>> rds_hostfile_path or
> >>>> something like that. when I try to retrieve rds_hostfile_path or
> >>>> rds_mosix_path in rds_mosix component I always get the default
> >>>> hostfile path
> >>>> (doesn't matter if I gave an hostfile or not). And I tried
> >>>> everything -
> >>>> changing names in rds_mosix_component, declaring a new parameter
> >>>> rds_mosix_path in various places etc. So now I'm just altering
> >>>> the existing
> >>>> hostfile component.
> >>>> Do you have any suggestions how to make it work?
> >>>>
> >>>> Sorry for all the questions and thank you very much for the
> >>>> quick answers
> >>>>
> >>>> Regards
> >>>> --David
> >>>>
> >>>> ---------- Forwarded message ----------
> >>>> From: Ralph Castain <rhc_at_[hidden]>
> >>>> Date: Oct 20, 2007 5:12 PM
> >>>> Subject: Re: [OMPI devel] Trying to get total procs num in odls
> >>>> framework
> >>>> To: David Erukhimovich <daviderukh_at_[hidden]>
> >>>>
> >>>> Hi David
> >>>>
> >>>> Thanks for the info - see comments below.
> >>>>
> >>>> Ralph
> >>>>
> >>>>
> >>>> On 10/20/07 6:58 AM, "David Erukhimovich"
> >>>> <daviderukh_at_[hidden]> wrote:
> >>>>
> >>>>> Hi
> >>>>> Thank you for your answer.
> >>>>>
> >>>>> First of all, my two questions wasn't connected and they belong to
> >>>> different
> >>>>> part of my project. and the subject of the mail should have
> >>>>> been: Trying
> >>>> to
> >>>>> get total procs num in rds framework (sorry my mistake).
> >>>>>
> >>>>> Here the parts in the order of the last email
> >>>>>
> >>>>> 1. I've solved the problem about getting total num of procs in
> >>>>> rds (just
> >>>>> called some function incorrectly), so sorry for disturbing you
> >>>>> about
> >>>> that.
> >>>>> Now a bit more about what I'm trying to do, maybe there is a
> >>>>> better way
> >>>> then
> >>>>> mine:
> >>>>> I have a tool (external application) that given a list of
> >>>>> machines and a
> >>>>> number n , it chooses the n best ones from the list (least
> >>>>> loaded ones)
> >>>> and
> >>>>> if the list of machines isn't given, it just returns the n best
> >>>>> machines
> >>>>> from the claster. I am wishing to include this in ompi. hence -
> >>>>> given a
> >>>>> machinefile, It'll run the process only on the best nodes. If a
> >>>> machinefile
> >>>>> isn't given, it'll take the best node that my application returns.
> >>>>> I think the best place to implement it is in rds - after
> >>>>> building the list
> >>>>> of newly discovered nodes: if it is empty, fill it using my tool,
> >>>> otherwise
> >>>>> filter it using my tool. It seems to me the most logical way to
> >>>>> do it. Am
> >>>> I
> >>>>> right? I am asking you because I guess you have a better
> >>>>> knowledge in ompi
> >>>>> architecture.
> >>>> It sounds like the correct place to me. At some point in the
> >>>> future, you
> >>>> could migrate that logic to the RAS instead, but I would just
> >>>> continue as
> >>>> you are doing for now.
> >>>>
> >>>>> 2. The other thing I am trying to do is to make ompi to run
> >>>>> every process,
> >>>>> not directly, but through external program. e.g: If I want to
> >>>>> launch the
> >>>>> program "hostname", I want that following to be launched: "<my-
> >>>>> program>
> >>>>> <my-program's-flags> hostname".
> >>>>> I figured that the best way to do it is in odls framework
> >>>>> because there I
> >>>>> have the exact executing point.
> >>>> I guess I wouldn't do it that way if I were doing a project of
> >>>> my own. I
> >>>> would just go into the default odls module and hardcode the
> >>>> revised launch.
> >>>> I can't see this coming back into the production system, so
> >>>> unless you have
> >>>> some reason to want to run both with and without your revision,
> >>>> why go
> >>>> through the pain?
> >>>>
> >>>>> I am currently working on the checkpoint 1.2.3. I don't work on
> >>>>> the trunk
> >>>>> because I need the patches to be added on some stable release.
> >>>>> Is there a
> >>>>> 1.2.* release where the bug is fixed. And if not - when can
> >>>>> such fixed
> >>>>> version be stable
> >>>> I don't think there are any plans to backport that fix, though I
> >>>> imagine it
> >>>> could be done. If not, I could try and create a patch for you
> >>>> next week,
> >>>> though I would again suggest you just hardcode your change into
> >>>> the existing
> >>>> odls default component to make your life easier.
> >>>>
> >>>> Ralph
> >>>>
> >>>>> Thank you
> >>>>> --Davis
> >>>>>
> >>>>> ---------- Forwarded message ----------
> >>>>> From: Ralph Castain <rhc_at_[hidden]>
> >>>>> Date: Oct 17, 2007 11:22 PM
> >>>>> Subject: Re: [OMPI devel] Trying to get total procs num in odls
> >>>>> framework
> >>>>> To: daviderukh_at_[hidden]
> >>>>> Cc: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]>
> >>>>>
> >>>>> Hi David
> >>>>>
> >>>>> I could probably answer your questions better if I had a better
> >>>>> understanding of what you are trying to do. For example,
> >>>>> looking in the
> >>>>> hostfile rds for the number of procs to be launched seems
> >>>>> strange as the
> >>>>> functional role of the framework is to simply learn what nodes are
> >>>>> available.
> >>>>>
> >>>>> It would also help to have some idea of what environment you
> >>>>> are working
> >>>> in,
> >>>>> and how you configured the beast.
> >>>>>
> >>>>> Please see comments below.
> >>>>> Ralph
> >>>>>
> >>>>>
> >>>>> On 10/17/07 2:47 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
> >>>>>
> >>>>>> Yo Ralph --
> >>>>>>
> >>>>>> Can you answer these questions?
> >>>>>>
> >>>>>> Begin forwarded message:
> >>>>>>
> >>>>>>> From: David Erukhimovich <daviderukh_at_[hidden]>
> >>>>>>> Date: October 14, 2007 5:08:45 PM EDT
> >>>>>>> To: devel_at_[hidden]
> >>>>>>> Subject: [OMPI devel] Trying to get total procs num in odls
> >>>>>>> framework
> >>>>>>> Reply-To: Open MPI Developers <devel_at_[hidden]>
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>> I have 2 questions:
> >>>>>>> 1. I am trying to get the total number of requested processes
> >>>>>>> for
> >>>>>>> the job
> >>>>>>> in' hostfile' component in rds. I took the job object that was
> >>>>>>> given as a
> >>>>>>> parameter, extracted the application objects and checked how
> >>>>>>> many
> >>>>>>> procs
> >>>>>>> each application has. The result in every run was 0. As I
> >>>>>>> understand, this
> >>>>>>> variable is updated before the rds part. So what am I doing
> >>>>>>> wrong?
> >>>>> Do you mean you took the jobid given to the hostfile RDS (which
> >>>>> isn't an
> >>>>> object, but just a number) and did an orte_rmgr.get_app_context
> >>>>> to get the
> >>>>> array of app_contexts? Is there some reason why you would want
> >>>>> to do that
> >>>>> there?
> >>>>>
> >>>>> Depending upon what the command line looks like, it is possible
> >>>>> for the
> >>>>> number of procs to be zero - we allow that option and then fill
> >>>>> in the
> >>>>> number later. If it was specified, though, we do insert the
> >>>>> number in the
> >>>>> app_context object.
> >>>>>
> >>>>> Maybe you could tell me what the command line looks like, the
> >>>>> function
> >>>> call
> >>>>> you used to get the "application objects", and what field you
> >>>>> were looking
> >>>>> at when you found zero?
> >>>>>
> >>>>>>> 2. I've discovered an undocumented framework - odls.
> >>>>> It wasn't exactly hidden...we haven't documented it because we
> >>>>> are lazy
> >>>> and
> >>>>> the existing components cover every known environment (or so we
> >>>>> thought).
> >>>>> ;-)
> >>>>>
> >>>>> Is there some special reason to want to create another one?
> >>>>>
> >>>>>>> I've created a
> >>>>>>> new
> >>>>>>> component for it. The problem is that there is no way to switch
> >>>>>>> between
> >>>>>>> the default component and mine (--mca odls <my component>
> >>>>>>> doesn't
> >>>>>>> work).
> >>>>>>> Is there a way to switch between odls components (I saw bprocs
> >>>>>>> there and
> >>>>>>> I guess it is used)?
> >>>>> Are you working on the trunk? What r level?
> >>>>>
> >>>>> Reason I ask: I recently fixed a problem where the command line
> >>>>> mca params
> >>>>> were not getting passed to the orteds. Your description looks
> >>>>> like you
> >>>>> haven't picked up that change. If you have updated recently,
> >>>>> and you still
> >>>>> can't get it to work, then we likely have a lingering problem.
> >>>>>
> >>>>>
> >>>>> If I read your subject line correctly, then I am somewhat
> >>>>> puzzled. You can
> >>>>> look at the orte/mca/odls/base/odls_base_default_fns.c file, the
> >>>>> orte_odls_base_default_get_add_procs_data function and see
> >>>>> where we get
> >>>> the
> >>>>> total number of procs in a job and how that is passed to the
> >>>>> orteds. If
> >>>> you
> >>>>> have some new environment that the existing odls components
> >>>>> can't handle,
> >>>>> then I would strongly suggest you at least use the default
> >>>>> functions in
> >>>> the
> >>>>> base to provide as much support as possible as this will help
> >>>>> you to keep
> >>>>> pace with changes in the system.
> >>>>>
> >>>>> I would also welcome feedback on what you encountered that
> >>>>> required a new
> >>>>> odls component - perhaps we can modify the base support
> >>>>> functions to make
> >>>> it
> >>>>> fit within one of the existing components.
> >>>>>
> >>>>> Thanks
> >>>>> Ralph
> >>>>>
> >>>>>
> >>>>>>> Thank you,
> >>>>>>> --David
> >>>>>>> _______________________________________________
> >>>>>>> devel mailing list
> >>>>>>> devel_at_[hidden]
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>