Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output
From: tmishima_at_[hidden]
Date: 2014-01-27 23:18:18


Thank you for your comment, Ralph.

I understand your explanation including "it's too late".
The ppr option is convinient for us because our environment is quite
hetero.
(It gives flexiblity to the number of procs)

I hope you do not deprecate ppr in the future release and aply my proposal
someday.

Regards,
Tetsuya Mishima

> I'm afraid it is too late for 1.7.4 as I have locked that down, barring
any last-second smoke test failures. I'll give this some thought for 1.7.5,
but I'm a little leery of the proposed change. The
> problem is that ppr comes in thru a different MCA param than the "map-by"
param, and hence we can indeed get conflicts that we cannot resolve.
>
> This is one of those issues that I need to cleanup in general. We've
deprecated a number of params due to similar problems - the "ppr" policy is
the last one on the list. Needs to somehow be folded
> into the "map-by" param, though it also influences the number of procs
(unlike the other map-by directives).
>
>
> On Jan 27, 2014, at 7:46 PM, tmishima_at_[hidden] wrote:
>
> >
> >
> > Hi Ralph, it seems you are rounding the final turn to release 1.7.4!
> > I hope this will be my final request for openmpi-1.7.4 as well.
> >
> > I mostly use rr_mapper but sometimes use ppr_mapper. I have a simple
> > request to ask you to improve its usability. Namely, I propose to
> > remove redfining-policy-check routine in rmaps_ppr_component.c
> > (the line 130-138) :
> >
> > 130 if (ORTE_MAPPING_GIVEN & ORTE_GET_MAPPING_DIRECTIVE
> > (orte_rmaps_base.mapping)) {
> > 131 /* if a non-default mapping is already specified, then
we
> > 132 * have an error
> > 133 */
> > 134 orte_show_help("help-orte-rmaps-base.txt",
> > "redefining-policy", true, "mapping",
> > 135 "PPR", orte_rmaps_base_print_mapping
> > (orte_rmaps_base.mapping));
> > 136 ORTE_SET_MAPPING_DIRECTIVE(orte_rmaps_base.mapping,
> > ORTE_MAPPING_CONFLICTED);
> > 137 return ORTE_ERR_SILENT;
> > 138 }
> >
> > The reasons are as follows:
> >
> > 1) The final mapper to be used should be selected by the priority set
> > by system or mca param. The ppr_priority is fixed to be 90 and the
> > rr_priority can be set by mca param(default = 10).
> >
> > 2) If we set "rmaps_base_mapping_policy = something" in
> > mca-params.conf, -ppr option is always refused by this check as
> > below:
> > [mishima_at_manage demos]$ mpirun -np 2 -ppr 1:socket
> > ~/mis/openmpi/demos/myprog
> >
--------------------------------------------------------------------------
> > Conflicting directives for mapping policy are causing the policy
> > to be redefined:
> >
> > New policy: PPR
> > Prior policy: BYSOCKET
> >
> > Please check that only one policy is defined.
> >
> > 3) This fix does not seem to affect any other behavior as far as
> > I confirmed.
> >
> > Regard,
> > Tetsuya Mishima
> >
> >> Kewl - thanks!
> >>
> >> On Jan 27, 2014, at 4:08 PM, tmishima_at_[hidden] wrote:
> >>
> >>>
> >>>
> >>> Thanks, Ralph. I quickly checked the fix. It worked fine for me.
> >>>
> >>> Tetsuya Mishima
> >>>
> >>>> I fixed that in today's final cleanup
> >>>>
> >>>> On Jan 27, 2014, at 3:17 PM, tmishima_at_[hidden] wrote:
> >>>>
> >>>>
> >>>>
> >>>> As for the NEWS - it is actually already correct. We default to
map-by
> >>>> core, not slot, as of 1.7.4.
> >>>>
> >>>> Is it correct? As far as I browse the source code, map-by slot is
used
> > if
> >>>> np <=2.
> >>>>
> >>>> [mishima_at_manage openmpi-1.7.4rc2r30425]$ cat -n
> >>>> orte/mca/rmaps/base/rmaps_base_map_job.c
> >>>> ...
> >>>> 107 /* default based on number of procs */
> >>>> 108 if (nprocs <= 2) {
> >>>> 109 opal_output_verbose(5,
> >>>> orte_rmaps_base_framework.framework_output,
> >>>> 110 "mca:rmaps mapping not
> > given -
> >>>> using byslot");
> >>>> 111 ORTE_SET_MAPPING_POLICY(map->mapping,
> >>>> ORTE_MAPPING_BYSLOT);
> >>>> 112 } else {
> >>>> 113 opal_output_verbose(5,
> >>>> orte_rmaps_base_framework.framework_output,
> >>>> 114 "mca:rmaps mapping not
> > given -
> >>>> using bysocket");
> >>>> 115 ORTE_SET_MAPPING_POLICY(map->mapping,
> >>>> ORTE_MAPPING_BYSOCKET);
> >>>> 116 }
> >>>>
> >>>> Regards,
> >>>> Tetsuya Mishima
> >>>>
> >>>> On Jan 26, 2014, at 3:02 PM, tmishima_at_[hidden] wrote:
> >>>>
> >>>>
> >>>> Hi Ralph,
> >>>>
> >>>> I tried latest nightly snapshots of openmpi-1.7.4rc2r30425.tar.gz.
> >>>> Almost everything works fine, except that the unexpected output
> > appears
> >>>> as below:
> >>>>
> >>>> [mishima_at_node04 ~]$ mpirun -cpus-per-proc 4
~/mis/openmpi/demos/myprog
> >>>> App launch reported: 3 (out of 3) daemons - 8 (out of 12) procs
> >>>> ...
> >>>>
> >>>> You dropped the if-statement checking "orte_report_launch_progress"
in
> >>>> plm_base_receive.c @ r30423, which causes the problem.
> >>>>
> >>>> --- orte/mca/plm/base/plm_base_receive.c.org2014-01-25
> >>>> 11:51:59.000000000 +0900
> >>>> +++ orte/mca/plm/base/plm_base_receive.c2014-01-26
> >>>> 12:20:10.000000000
> >>>> +0900
> >>>> @@ -315,9 +315,11 @@
> >>>> /* record that we heard back from a daemon during app
> >>>> launch
> >>>> */
> >>>> if (running && NULL != jdata) {
> >>>> jdata->num_daemons_reported++;
> >>>> - if (0 == jdata->num_daemons_reported % 100 ||
> >>>> - jdata->num_daemons_reported ==
> >>>> orte_process_info.num_procs) {
> >>>> - ORTE_ACTIVATE_JOB_STATE(jdata,
> >>>> ORTE_JOB_STATE_REPORT_PROGRESS);
> >>>> + if (orte_report_launch_progress) {
> >>>> + if (0 == jdata->num_daemons_reported % 100 ||
> >>>> + jdata->num_daemons_reported ==
> >>>> orte_process_info.num_procs) {
> >>>> + ORTE_ACTIVATE_JOB_STATE(jdata,
> >>>> ORTE_JOB_STATE_REPORT_PROGRESS);
> >>>> + }
> >>>> }
> >>>> }
> >>>> /* prepare for next job */
> >>>>
> >>>> Regards,
> >>>> Tetsuya Mishima
> >>>>
> >>>> P.S. It's also better to change the line 65 in NEWS.
> >>>>
> >>>> ...
> >>>> 64 * Mapping:
> >>>> 65 * if #procs <= 2, default to map-by core -> map-by slot
> >>>> ^^^^^^^^^^^
> >>>> 66 * if #procs > 2, default to map-by socket
> >>>> ...
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>>
> >>>
> >
http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________

> >
> >>>
> >>>> users mailing list
> >>>> users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users