Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output
From: tmishima_at_[hidden]
Date: 2014-01-28 18:11:09


Thanks, Ralph. I'm happy to hear that.

By the way, openmpi-1.7.4rc2 works fine for me.

Tetsuya Mishima

> Let me clarify: the functionality will remain as it is useful to many.
What we need to do is somehow capture that command in the current map-by
parameter so we avoid issues like the one you are
> experiencing.
>
> HTH
> Ralph
>
> On Jan 27, 2014, at 8:18 PM, tmishima_at_[hidden] wrote:
>
> >
> >
> > Thank you for your comment, Ralph.
> >
> > I understand your explanation including "it's too late".
> > The ppr option is convinient for us because our environment is quite
> > hetero.
> > (It gives flexiblity to the number of procs)
> >
> > I hope you do not deprecate ppr in the future release and aply my
proposal
> > someday.
> >
> > Regards,
> > Tetsuya Mishima
> >
> >> I'm afraid it is too late for 1.7.4 as I have locked that down,
barring
> > any last-second smoke test failures. I'll give this some thought for
1.7.5,
> > but I'm a little leery of the proposed change. The
> >> problem is that ppr comes in thru a different MCA param than the
"map-by"
> > param, and hence we can indeed get conflicts that we cannot resolve.
> >>
> >> This is one of those issues that I need to cleanup in general. We've
> > deprecated a number of params due to similar problems - the "ppr"
policy is
> > the last one on the list. Needs to somehow be folded
> >> into the "map-by" param, though it also influences the number of procs
> > (unlike the other map-by directives).
> >>
> >>
> >> On Jan 27, 2014, at 7:46 PM, tmishima_at_[hidden] wrote:
> >>
> >>>
> >>>
> >>> Hi Ralph, it seems you are rounding the final turn to release 1.7.4!
> >>> I hope this will be my final request for openmpi-1.7.4 as well.
> >>>
> >>> I mostly use rr_mapper but sometimes use ppr_mapper. I have a simple
> >>> request to ask you to improve its usability. Namely, I propose to
> >>> remove redfining-policy-check routine in rmaps_ppr_component.c
> >>> (the line 130-138) :
> >>>
> >>> 130 if (ORTE_MAPPING_GIVEN & ORTE_GET_MAPPING_DIRECTIVE
> >>> (orte_rmaps_base.mapping)) {
> >>> 131 /* if a non-default mapping is already specified,
then
> > we
> >>> 132 * have an error
> >>> 133 */
> >>> 134 orte_show_help("help-orte-rmaps-base.txt",
> >>> "redefining-policy", true, "mapping",
> >>> 135 "PPR", orte_rmaps_base_print_mapping
> >>> (orte_rmaps_base.mapping));
> >>> 136 ORTE_SET_MAPPING_DIRECTIVE(orte_rmaps_base.mapping,
> >>> ORTE_MAPPING_CONFLICTED);
> >>> 137 return ORTE_ERR_SILENT;
> >>> 138 }
> >>>
> >>> The reasons are as follows:
> >>>
> >>> 1) The final mapper to be used should be selected by the priority set
> >>> by system or mca param. The ppr_priority is fixed to be 90 and the
> >>> rr_priority can be set by mca param(default = 10).
> >>>
> >>> 2) If we set "rmaps_base_mapping_policy = something" in
> >>> mca-params.conf, -ppr option is always refused by this check as
> >>> below:
> >>> [mishima_at_manage demos]$ mpirun -np 2 -ppr 1:socket
> >>> ~/mis/openmpi/demos/myprog
> >>>
> >
--------------------------------------------------------------------------
> >>> Conflicting directives for mapping policy are causing the policy
> >>> to be redefined:
> >>>
> >>> New policy: PPR
> >>> Prior policy: BYSOCKET
> >>>
> >>> Please check that only one policy is defined.
> >>>
> >>> 3) This fix does not seem to affect any other behavior as far as
> >>> I confirmed.
> >>>
> >>> Regard,
> >>> Tetsuya Mishima
> >>>
> >>>> Kewl - thanks!
> >>>>
> >>>> On Jan 27, 2014, at 4:08 PM, tmishima_at_[hidden] wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> Thanks, Ralph. I quickly checked the fix. It worked fine for me.
> >>>>>
> >>>>> Tetsuya Mishima
> >>>>>
> >>>>>> I fixed that in today's final cleanup
> >>>>>>
> >>>>>> On Jan 27, 2014, at 3:17 PM, tmishima_at_[hidden] wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> As for the NEWS - it is actually already correct. We default to
> > map-by
> >>>>>> core, not slot, as of 1.7.4.
> >>>>>>
> >>>>>> Is it correct? As far as I browse the source code, map-by slot is
> > used
> >>> if
> >>>>>> np <=2.
> >>>>>>
> >>>>>> [mishima_at_manage openmpi-1.7.4rc2r30425]$ cat -n
> >>>>>> orte/mca/rmaps/base/rmaps_base_map_job.c
> >>>>>> ...
> >>>>>> 107 /* default based on number of procs */
> >>>>>> 108 if (nprocs <= 2) {
> >>>>>> 109 opal_output_verbose(5,
> >>>>>> orte_rmaps_base_framework.framework_output,
> >>>>>> 110 "mca:rmaps mapping not
> >>> given -
> >>>>>> using byslot");
> >>>>>> 111 ORTE_SET_MAPPING_POLICY(map->mapping,
> >>>>>> ORTE_MAPPING_BYSLOT);
> >>>>>> 112 } else {
> >>>>>> 113 opal_output_verbose(5,
> >>>>>> orte_rmaps_base_framework.framework_output,
> >>>>>> 114 "mca:rmaps mapping not
> >>> given -
> >>>>>> using bysocket");
> >>>>>> 115 ORTE_SET_MAPPING_POLICY(map->mapping,
> >>>>>> ORTE_MAPPING_BYSOCKET);
> >>>>>> 116 }
> >>>>>>
> >>>>>> Regards,
> >>>>>> Tetsuya Mishima
> >>>>>>
> >>>>>> On Jan 26, 2014, at 3:02 PM, tmishima_at_[hidden] wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hi Ralph,
> >>>>>>
> >>>>>> I tried latest nightly snapshots of openmpi-1.7.4rc2r30425.tar.gz.
> >>>>>> Almost everything works fine, except that the unexpected output
> >>> appears
> >>>>>> as below:
> >>>>>>
> >>>>>> [mishima_at_node04 ~]$ mpirun -cpus-per-proc 4
> > ~/mis/openmpi/demos/myprog
> >>>>>> App launch reported: 3 (out of 3) daemons - 8 (out of 12) procs
> >>>>>> ...
> >>>>>>
> >>>>>> You dropped the if-statement checking
"orte_report_launch_progress"
> > in
> >>>>>> plm_base_receive.c @ r30423, which causes the problem.
> >>>>>>
> >>>>>> --- orte/mca/plm/base/plm_base_receive.c.org2014-01-25
> >>>>>> 11:51:59.000000000 +0900
> >>>>>> +++ orte/mca/plm/base/plm_base_receive.c2014-01-26
> >>>>>> 12:20:10.000000000
> >>>>>> +0900
> >>>>>> @@ -315,9 +315,11 @@
> >>>>>> /* record that we heard back from a daemon during app
> >>>>>> launch
> >>>>>> */
> >>>>>> if (running && NULL != jdata) {
> >>>>>> jdata->num_daemons_reported++;
> >>>>>> - if (0 == jdata->num_daemons_reported % 100 ||
> >>>>>> - jdata->num_daemons_reported ==
> >>>>>> orte_process_info.num_procs) {
> >>>>>> - ORTE_ACTIVATE_JOB_STATE(jdata,
> >>>>>> ORTE_JOB_STATE_REPORT_PROGRESS);
> >>>>>> + if (orte_report_launch_progress) {
> >>>>>> + if (0 == jdata->num_daemons_reported % 100 ||
> >>>>>> + jdata->num_daemons_reported ==
> >>>>>> orte_process_info.num_procs) {
> >>>>>> + ORTE_ACTIVATE_JOB_STATE(jdata,
> >>>>>> ORTE_JOB_STATE_REPORT_PROGRESS);
> >>>>>> + }
> >>>>>> }
> >>>>>> }
> >>>>>> /* prepare for next job */
> >>>>>>
> >>>>>> Regards,
> >>>>>> Tetsuya Mishima
> >>>>>>
> >>>>>> P.S. It's also better to change the line 65 in NEWS.
> >>>>>>
> >>>>>> ...
> >>>>>> 64 * Mapping:
> >>>>>> 65 * if #procs <= 2, default to map-by core -> map-by slot
> >>>>>> ^^^^^^^^^^^
> >>>>>> 66 * if #procs > 2, default to map-by socket
> >>>>>> ...
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> users_at_[hidden]
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> users_at_[hidden]
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> users_at_[hidden]
> >>>>>>
> >>>>>
> >>>
> >
http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________

> >
> >>>
> >>>>>
> >>>>>> users mailing list
> >>>>>>
users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users