i got an algorithm that generates trees, of different sizes, recursively. at the moment i have the algorithm in its secuential version.
here we have 4 identical computers with Xeon 8-core in each node + 4gb ram. they have HyperThreading so they count as 16-processors per node.
so i can launch a total of 64 parallel threads.
my question is, what could be the best approach when using MPI.???
assigning -np 64 maybe is not a good idea, because i would not be taking advantage of the vecinity of cores which could improve memory tasks speeds, i mean it might be better to have 4 mpi processes and each one of these spawn 15 threads locally???...(can i mix MPI with local threads right? )
i dont have much experience in MPI, i only programmed bigger algorithms in CUDA which is much easier.
any suggestions or help is welcome