Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] binding with MCA parameters: broken or user error?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-10-09 21:28:53


Try adding -display-devel-map to your cmd line so you can see what
OMPI thinks the binding and mapping policy is set to - that'll tell
you if the problem is in the mapping or in the daemon binding.

Also, it might help to know something about this node - like how many
sockets, cores/socket.

On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote:

> Here are two problems with openmpi-1.3.4a1r22051
>
> # Here, I try to run the moral equivalent of -bysocket -bind-to-
> socket,
> # using the MCA parameter form specified on the mpirun command line.
> # No binding results. THIS IS PROBLEM 1.
> % mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca
> orte_process_binding socket -report-bindings hostname
> saem9
> saem9
> saem9
> saem9
> saem9
>
> # Same thing with the "core" form.
> % mpirun -np 5 --mca rmaps_base_schedule_policy core --mca
> orte_process_binding core -report-bindings hostname
> saem9
> saem9
> saem9
> saem9
> saem9
>
> # Now, I set the MCA parameters as environment variables.
> # I then check the spellings and confirm all is set using ompi_info.
> % setenv OMPI_MCA_rmaps_base_schedule_policy socket
> % setenv OMPI_MCA_orte_process_binding socket
> % ompi_info -a | grep rmaps_base_schedule_policy
> MCA rmaps: parameter
> "rmaps_base_schedule_policy" (current value: "socket", data source:
> environment)
> % ompi_info -a | grep orte_process_binding
> MCA orte: parameter "orte_process_binding" (current
> value: "socket", data source: environment)
>
> # So, now I run a simple program.
> # I get binding now, but I'm filling up the first socket before
> going to the second.
> # THIS IS PROBLEM 2.
> % mpirun -np 5 -report-bindings hostname
> [saem9:23947] [[29741,0],0] odls:default:fork binding child
> [[29741,1],0] to socket 0 cpus 000f
> [saem9:23947] [[29741,0],0] odls:default:fork binding child
> [[29741,1],1] to socket 0 cpus 000f
> [saem9:23947] [[29741,0],0] odls:default:fork binding child
> [[29741,1],2] to socket 0 cpus 000f
> [saem9:23947] [[29741,0],0] odls:default:fork binding child
> [[29741,1],3] to socket 0 cpus 000f
> [saem9:23947] [[29741,0],0] odls:default:fork binding child
> [[29741,1],4] to socket 1 cpus 00f0
> saem9
> saem9
> saem9
> saem9
> saem9
>
> # Adding -bysocket to the command line fixes things.
> % mpirun -np 5 -bysocket -report-bindings hostname
> [saem9:23953] [[29751,0],0] odls:default:fork binding child
> [[29751,1],0] to socket 0 cpus 000f
> [saem9:23953] [[29751,0],0] odls:default:fork binding child
> [[29751,1],1] to socket 1 cpus 00f0
> [saem9:23953] [[29751,0],0] odls:default:fork binding child
> [[29751,1],2] to socket 0 cpus 000f
> [saem9:23953] [[29751,0],0] odls:default:fork binding child
> [[29751,1],3] to socket 1 cpus 00f0
> [saem9:23953] [[29751,0],0] odls:default:fork binding child
> [[29751,1],4] to socket 0 cpus 000f
> saem9
> saem9
> saem9
> saem9
> saem9
>
> Bug? Or am I doing something wrong?
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel