Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] "bind-to l3chace" with r30643 in ticket #4240 dosen't work
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-11 22:17:09


Guess I disagree - it isn't a question of what the code can handle, but rather user expectation. If you specify a definite number of cores for each process, then we have to bind to core in order to meet that directive. Binding to numa won't do it as the OS will continue to schedule the proc on only one core at a time.

So I think the current behavior is correct.
Ralph

On Feb 11, 2014, at 7:13 PM, Tetsuya Mishima <tmishima_at_[hidden]> wrote:

> Your fix worked for me, thanks.
>
> By the way, I noticed that "bind-to obj" is forcibly override by "bind-to core", when pe=N is specified.
> This is just my opinion, but I think it's too conservative and a kind of regression from the openmpi-1.6.5. For example, "-map-by slot:pe=N -bind-to numa" looks
> acceptable to me. Your round_robin_mapper is now robust enough to handle it. The patch below would be better. Please give it a try.
>
> --- orte/mca/rmaps/base/rmaps_base_frame.c.org 2014-02-11 17:34:36.000000000 +0900
> +++ orte/mca/rmaps/base/rmaps_base_frame.c 2014-02-12 11:01:42.000000000 +0900
> @@ -393,13 +393,13 @@
> * bind to those cpus - any other binding policy is an
> * error
> */
> - if (!(OPAL_BIND_GIVEN & OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy))) {
> + if (OPAL_BIND_TO_NONE == OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy)) {
> if (opal_hwloc_use_hwthreads_as_cpus) {
> OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_HWTHREAD);
> } else {
> OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_CORE);
> }
> - } else {
> + } else if (OPAL_BIND_TO_L1CACHE < OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy)) {
> if (opal_hwloc_use_hwthreads_as_cpus) {
> if (OPAL_BIND_TO_HWTHREAD != OPAL_GET_BINDING_POLICY(opal_hwloc_binding_policy)) {
> orte_show_help("help-orte-rmaps-base.txt", "mismatch-binding", true,
> Regards,
> Tetsuya Mishima
>
>> Okay, I fixed it. Keep getting caught by a very, very unfortunate design flaw in hwloc that forces you to treat cache's as a special case that requires you to
> call a different function. So you have to constantly protect function calls into hwloc with "if cache, call this one - else, call that one". REALLY irritating, and
> it caught us again here.
>>
>> Should be fixed now in trunk now, set to go over to 1.7.5
>>
>> Thanks
>> Ralph
>>
>> On Feb 11, 2014, at 4:47 PM, tmishima_at_[hidden] wrote:
>>
>>>
>>> Hi Ralph,
>>>
>>> Since the ticket #4240 has been already set as fixed, I'm sending this
>>> email to you. ( I don't konw I could add comments to the fixed ticket)
>>>
>>> When I tried to bind the process to l3chace, it didn't work like below:
>>> (the host mangae has the normal topology - not inverted)
>>>
>>> [mishima_at_manage openmpi-1.7.4]$ mpirun -np 2 -bind-to l3cache
>>> -report-bindings ~/mis/openmpi/demos/myprog
>>> --------------------------------------------------------------------------
>>> No objects of the specified type were found on at least one node:
>>>
>>> Type: Cache
>>> Node: manage
>>>
>>> The map cannot be done as specified.
>>> --------------------------------------------------------------------------
>>>
>>> "-bind-to l1cache/l2cahce" doesn't work as well. At least, I confirmed that
>>> the openmpi-1.7.4 works with "-bind-to l3cache".
>>>
>>> Regards,
>>> Tetsuya Mishima
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ----
> Tetsuya Mishima tmishima_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users