Thanks, I understand what you are saying. But my query is regarding the
design of MPI_AllReduce for shared-memory systems. I mean is there any
different logic/design of MPI_AllReduce when OpenMPI is used on
The standard MPI_AllReduce says,
1. Each MPI process sends its value (and WAIT for others to send)
2. Values from all the processes is combined
3. Computed result is sent back to all processes (all LEAVE)
Does OpenMPI implement the same logic/design for shared-memory system or
it has some other way of doing it for shared-memory?
Quoting "Yuan, Huapeng" <yuanh_at_[hidden]>:
> I think it has nothing to do with shared memory. It just has
> to do with process or thread.
> So, with interprocess, you can use mpi in shared memory (multicore or
> distributed shared memory). But for multiple threads in the same
> process, it cannot be used.
> Hope this helps.
> Quoting smairal_at_[hidden]:
> > Can anyone help on this?
> > -Thanks,
> > Sarang.
> > Quoting smairal_at_[hidden]:
> >> Hi,
> >> I am doing a research on parallel techniques for shared-memory
> >> systems(NUMA). I understand that OpenMPI is intelligent to utilize
> >> shared-memory system and it uses processor-affinity. Is the
> >> design of MPI_AllReduce "same" for shared-memory (NUMA) as well as
> >> distributed system? Can someone please tell me MPI_AllReduce
> >> in
> >> brief, in terms of processes and their interaction on
> >> Else please suggest me a good reference for this.
> >> -Thanks,
> >> Sarang.
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users