I am doing a research on parallel techniques for shared-memory
systems(NUMA). I understand that OpenMPI is intelligent to utilize
shared-memory system and it uses processor-affinity. Is the OpenMPI
design of MPI_AllReduce "same" for shared-memory (NUMA) as well as
distributed system? Can someone please tell me MPI_AllReduce design, in
brief, in terms of processes and their interaction on shared-memory?
Else please suggest me a good reference for this.