I would like to ask about collective communication. With debug mode
enabled, I can see many info during the execution which algorithm is
used etc. But my question is that I would like to use a specific
algorithm (the simplest I suppose). I am profiling some applications and
I want to simulate them with another program so I must be able to know
for example what the mpi_allreduce is doing. I saw many algorithms that
depend on the message size and the number of processors, so I would like
1) what is the way to say at open mpi to use a simple algorithm for
allreduce (is there any way to say to use the simplest algorithm for all
the collective communication?). Basically I would like to know the root
cpu for every collective communication. What are the disadvantages for
demanding the simplest algorithm?
2) Is there any overhead because I installed open mpi with debug mode
even if I just run a program without any flag with --mca?
3) How you could describe allreduce by words? Can we say that the root
cpu does reduce and then broadcast? I mean is that right for your
implementation? I saw that it depends on the algorithm which cpu is the
root, so is it possible to use an algorithm that I will know every time
that cpu with rank 0 is the root?
Thanks a lot,