Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] binding with MCA parameters: broken or user error?
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2009-10-12 11:19:16


Ralph Castain wrote:
> I fixed the process schedule issue on the trunk over the weekend (not
> moved to 1.3 yet while it "soaked") - the binding issue was working
> fine on the trunk.
So there was an issue of "-mca orte_process_binding" not being interpreted?
>
> I believe I applied the fix to stop calling register_params twice to
> 1.3 already, but I can check.
No I was asking whether that fix might be causing the
orte_process_binding mca param to not be interpreted. But I think from
what you say in the first paragraph I guess I probably was wrong.

--td

>
> On Oct 12, 2009, at 4:36 AM, Terry Dontje wrote:
>
>> In regards to the "-mca XXX" option not overriding the file setting I
>> thought I saw this working for v1.3. However, I just retested this
>> and I am seeing the same issue of the "-mca" option not affecting
>> orte_process_binding or rmaps_base_schedule_policy.
>>
>> This seems to work under the trunk. I wonder if the issue might be
>> something we did in r22050 where we stopped calling
>> orte_register_params twice? Not sure exactly why that would have
>> prevented the mca option setting taking place the first time.
>> --td
>>
>> Ralph Castain wrote:
>>> Try adding -display-devel-map to your cmd line so you can see what
>>> OMPI thinks the binding and mapping policy is set to - that'll tell
>>> you if the problem is in the mapping or in the daemon binding.
>>>
>>> Also, it might help to know something about this node - like how
>>> many sockets, cores/socket.
>>>
>>> On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote:
>>>
>>>> Here are two problems with openmpi-1.3.4a1r22051
>>>>
>>>> # Here, I try to run the moral equivalent of -bysocket
>>>> -bind-to-socket,
>>>> # using the MCA parameter form specified on the mpirun command line.
>>>> # No binding results. THIS IS PROBLEM 1.
>>>> % mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca
>>>> orte_process_binding socket -report-bindings hostname
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>>
>>>> # Same thing with the "core" form.
>>>> % mpirun -np 5 --mca rmaps_base_schedule_policy core --mca
>>>> orte_process_binding core -report-bindings hostname
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>>
>>>> # Now, I set the MCA parameters as environment variables.
>>>> # I then check the spellings and confirm all is set using ompi_info.
>>>> % setenv OMPI_MCA_rmaps_base_schedule_policy socket
>>>> % setenv OMPI_MCA_orte_process_binding socket
>>>> % ompi_info -a | grep rmaps_base_schedule_policy
>>>> MCA rmaps: parameter "rmaps_base_schedule_policy"
>>>> (current value: "socket", data source: environment)
>>>> % ompi_info -a | grep orte_process_binding
>>>> MCA orte: parameter "orte_process_binding" (current
>>>> value: "socket", data source: environment)
>>>>
>>>> # So, now I run a simple program.
>>>> # I get binding now, but I'm filling up the first socket before
>>>> going to the second.
>>>> # THIS IS PROBLEM 2.
>>>> % mpirun -np 5 -report-bindings hostname
>>>> [saem9:23947] [[29741,0],0] odls:default:fork binding child
>>>> [[29741,1],0] to socket 0 cpus 000f
>>>> [saem9:23947] [[29741,0],0] odls:default:fork binding child
>>>> [[29741,1],1] to socket 0 cpus 000f
>>>> [saem9:23947] [[29741,0],0] odls:default:fork binding child
>>>> [[29741,1],2] to socket 0 cpus 000f
>>>> [saem9:23947] [[29741,0],0] odls:default:fork binding child
>>>> [[29741,1],3] to socket 0 cpus 000f
>>>> [saem9:23947] [[29741,0],0] odls:default:fork binding child
>>>> [[29741,1],4] to socket 1 cpus 00f0
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>>
>>>> # Adding -bysocket to the command line fixes things.
>>>> % mpirun -np 5 -bysocket -report-bindings hostname
>>>> [saem9:23953] [[29751,0],0] odls:default:fork binding child
>>>> [[29751,1],0] to socket 0 cpus 000f
>>>> [saem9:23953] [[29751,0],0] odls:default:fork binding child
>>>> [[29751,1],1] to socket 1 cpus 00f0
>>>> [saem9:23953] [[29751,0],0] odls:default:fork binding child
>>>> [[29751,1],2] to socket 0 cpus 000f
>>>> [saem9:23953] [[29751,0],0] odls:default:fork binding child
>>>> [[29751,1],3] to socket 1 cpus 00f0
>>>> [saem9:23953] [[29751,0],0] odls:default:fork binding child
>>>> [[29751,1],4] to socket 0 cpus 000f
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>> saem9
>>>>
>>>> Bug? Or am I doing something wrong?
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel