Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] binding with MCA parameters: broken or user error?
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-10-09 01:17:16


Here are two problems with openmpi-1.3.4a1r22051

# Here, I try to run the moral equivalent of -bysocket -bind-to-socket,
# using the MCA parameter form specified on the mpirun command line.
# No binding results. THIS IS PROBLEM 1.
% mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca
orte_process_binding socket -report-bindings hostname
saem9
saem9
saem9
saem9
saem9

# Same thing with the "core" form.
% mpirun -np 5 --mca rmaps_base_schedule_policy core --mca
orte_process_binding core -report-bindings hostname
saem9
saem9
saem9
saem9
saem9

# Now, I set the MCA parameters as environment variables.
# I then check the spellings and confirm all is set using ompi_info.
% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket
% ompi_info -a | grep rmaps_base_schedule_policy
               MCA rmaps: parameter "rmaps_base_schedule_policy"
(current value: "socket", data source: environment)
% ompi_info -a | grep orte_process_binding
                MCA orte: parameter "orte_process_binding" (current
value: "socket", data source: environment)

# So, now I run a simple program.
# I get binding now, but I'm filling up the first socket before going to
the second.
# THIS IS PROBLEM 2.
% mpirun -np 5 -report-bindings hostname
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],0] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],1] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],2] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],3] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child
[[29741,1],4] to socket 1 cpus 00f0
saem9
saem9
saem9
saem9
saem9

# Adding -bysocket to the command line fixes things.
% mpirun -np 5 -bysocket -report-bindings hostname
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],0] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],1] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],2] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],3] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child
[[29751,1],4] to socket 0 cpus 000f
saem9
saem9
saem9
saem9
saem9

Bug? Or am I doing something wrong?