I've seen this behaviour with MUMPS on shared-memory machines as well
using MPI. I use the iterative refinement capability to sharpen the
last few digits of the solution ( 2 or 3 iterations is usually enough).
If you're not using that, give it a try, it will probably reduce the
noise you're getting in your results. The quality of the answer from a
direct solve is highly dependent on the matrix scaling and pivot order
and it's easy to get differences in the last few digits. MUMPS itself
is also asynchronous, and might not be completely deterministic in how
it solves if MPI processes can run in a different order.
George Bosilca wrote:
> This is a problem of numerical stability, and there is no solution for
> such a problem in MPI. Usually, preconditioning the input matrix
> improve the numerical stability.
> If you read the MPI standard, there is a __short__ section about what
> guarantees the MPI collective communications provide. There is only
> one: if you run the same collective twice, on the same set of nodes
> with the same input data, you will get the same output. In fact the
> main problem is that MPI consider all default operations (MPI_OP) as
> being commutative and associative, which is usually the case in real
> world but not when floating point rounding is around. When you
> increase the number of nodes, the data will be spread in smaller
> pieces, which means more operations will have to be done in order to
> achieve the reduction, i.e. more rounding errors might occur and so on.
> On May 27, 2009, at 11:16 , vasilis wrote:
>>> Rank 0 accumulates all the res_cpu values into a single array, res. It
>>> starts with its own res_cpu and then adds all other processes. When
>>> np=2, that means the order is prescribed. When np>2, the order is no
>>> longer prescribed and some floating-point rounding variations can start
>>> to occur.
>> Yes you are right. Now, the question is why would these
>> floating-point rounding
>> variations occur for np>2? It cannot be due to a not prescribed order!!
>>> If you want results to be more deterministic, you need to fix the order
>>> in which res is aggregated. E.g., instead of using MPI_ANY_SOURCE,
>>> over the peer processes in a specific order.
>>> P.S. It seems to me that you could use MPI collective operations to
>>> implement what you're doing. E.g., something like:
>> I could use these operations for the res variable (Will it make the
>> any faster?). But, I can not use them for the other 3 variables.
>> users mailing list
> users mailing list