Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi-1.7.4rc2r30425 produces unexpected output
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-01-27 23:06:31


I'm afraid it is too late for 1.7.4 as I have locked that down, barring any last-second smoke test failures. I'll give this some thought for 1.7.5, but I'm a little leery of the proposed change. The problem is that ppr comes in thru a different MCA param than the "map-by" param, and hence we can indeed get conflicts that we cannot resolve.

This is one of those issues that I need to cleanup in general. We've deprecated a number of params due to similar problems - the "ppr" policy is the last one on the list. Needs to somehow be folded into the "map-by" param, though it also influences the number of procs (unlike the other map-by directives).

On Jan 27, 2014, at 7:46 PM, tmishima_at_[hidden] wrote:

>
>
> Hi Ralph, it seems you are rounding the final turn to release 1.7.4!
> I hope this will be my final request for openmpi-1.7.4 as well.
>
> I mostly use rr_mapper but sometimes use ppr_mapper. I have a simple
> request to ask you to improve its usability. Namely, I propose to
> remove redfining-policy-check routine in rmaps_ppr_component.c
> (the line 130-138) :
>
> 130 if (ORTE_MAPPING_GIVEN & ORTE_GET_MAPPING_DIRECTIVE
> (orte_rmaps_base.mapping)) {
> 131 /* if a non-default mapping is already specified, then we
> 132 * have an error
> 133 */
> 134 orte_show_help("help-orte-rmaps-base.txt",
> "redefining-policy", true, "mapping",
> 135 "PPR", orte_rmaps_base_print_mapping
> (orte_rmaps_base.mapping));
> 136 ORTE_SET_MAPPING_DIRECTIVE(orte_rmaps_base.mapping,
> ORTE_MAPPING_CONFLICTED);
> 137 return ORTE_ERR_SILENT;
> 138 }
>
> The reasons are as follows:
>
> 1) The final mapper to be used should be selected by the priority set
> by system or mca param. The ppr_priority is fixed to be 90 and the
> rr_priority can be set by mca param(default = 10).
>
> 2) If we set "rmaps_base_mapping_policy = something" in
> mca-params.conf, -ppr option is always refused by this check as
> below:
> [mishima_at_manage demos]$ mpirun -np 2 -ppr 1:socket
> ~/mis/openmpi/demos/myprog
> --------------------------------------------------------------------------
> Conflicting directives for mapping policy are causing the policy
> to be redefined:
>
> New policy: PPR
> Prior policy: BYSOCKET
>
> Please check that only one policy is defined.
>
> 3) This fix does not seem to affect any other behavior as far as
> I confirmed.
>
> Regard,
> Tetsuya Mishima
>
>> Kewl - thanks!
>>
>> On Jan 27, 2014, at 4:08 PM, tmishima_at_[hidden] wrote:
>>
>>>
>>>
>>> Thanks, Ralph. I quickly checked the fix. It worked fine for me.
>>>
>>> Tetsuya Mishima
>>>
>>>> I fixed that in today's final cleanup
>>>>
>>>> On Jan 27, 2014, at 3:17 PM, tmishima_at_[hidden] wrote:
>>>>
>>>>
>>>>
>>>> As for the NEWS - it is actually already correct. We default to map-by
>>>> core, not slot, as of 1.7.4.
>>>>
>>>> Is it correct? As far as I browse the source code, map-by slot is used
> if
>>>> np <=2.
>>>>
>>>> [mishima_at_manage openmpi-1.7.4rc2r30425]$ cat -n
>>>> orte/mca/rmaps/base/rmaps_base_map_job.c
>>>> ...
>>>> 107 /* default based on number of procs */
>>>> 108 if (nprocs <= 2) {
>>>> 109 opal_output_verbose(5,
>>>> orte_rmaps_base_framework.framework_output,
>>>> 110 "mca:rmaps mapping not
> given -
>>>> using byslot");
>>>> 111 ORTE_SET_MAPPING_POLICY(map->mapping,
>>>> ORTE_MAPPING_BYSLOT);
>>>> 112 } else {
>>>> 113 opal_output_verbose(5,
>>>> orte_rmaps_base_framework.framework_output,
>>>> 114 "mca:rmaps mapping not
> given -
>>>> using bysocket");
>>>> 115 ORTE_SET_MAPPING_POLICY(map->mapping,
>>>> ORTE_MAPPING_BYSOCKET);
>>>> 116 }
>>>>
>>>> Regards,
>>>> Tetsuya Mishima
>>>>
>>>> On Jan 26, 2014, at 3:02 PM, tmishima_at_[hidden] wrote:
>>>>
>>>>
>>>> Hi Ralph,
>>>>
>>>> I tried latest nightly snapshots of openmpi-1.7.4rc2r30425.tar.gz.
>>>> Almost everything works fine, except that the unexpected output
> appears
>>>> as below:
>>>>
>>>> [mishima_at_node04 ~]$ mpirun -cpus-per-proc 4 ~/mis/openmpi/demos/myprog
>>>> App launch reported: 3 (out of 3) daemons - 8 (out of 12) procs
>>>> ...
>>>>
>>>> You dropped the if-statement checking "orte_report_launch_progress" in
>>>> plm_base_receive.c @ r30423, which causes the problem.
>>>>
>>>> --- orte/mca/plm/base/plm_base_receive.c.org2014-01-25
>>>> 11:51:59.000000000 +0900
>>>> +++ orte/mca/plm/base/plm_base_receive.c2014-01-26
>>>> 12:20:10.000000000
>>>> +0900
>>>> @@ -315,9 +315,11 @@
>>>> /* record that we heard back from a daemon during app
>>>> launch
>>>> */
>>>> if (running && NULL != jdata) {
>>>> jdata->num_daemons_reported++;
>>>> - if (0 == jdata->num_daemons_reported % 100 ||
>>>> - jdata->num_daemons_reported ==
>>>> orte_process_info.num_procs) {
>>>> - ORTE_ACTIVATE_JOB_STATE(jdata,
>>>> ORTE_JOB_STATE_REPORT_PROGRESS);
>>>> + if (orte_report_launch_progress) {
>>>> + if (0 == jdata->num_daemons_reported % 100 ||
>>>> + jdata->num_daemons_reported ==
>>>> orte_process_info.num_procs) {
>>>> + ORTE_ACTIVATE_JOB_STATE(jdata,
>>>> ORTE_JOB_STATE_REPORT_PROGRESS);
>>>> + }
>>>> }
>>>> }
>>>> /* prepare for next job */
>>>>
>>>> Regards,
>>>> Tetsuya Mishima
>>>>
>>>> P.S. It's also better to change the line 65 in NEWS.
>>>>
>>>> ...
>>>> 64 * Mapping:
>>>> 65 * if #procs <= 2, default to map-by core -> map-by slot
>>>> ^^^^^^^^^^^
>>>> 66 * if #procs > 2, default to map-by socket
>>>> ...
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>>
>>>
> http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________
>
>>>
>>>> users mailing list
>>>> users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users