Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] "An error occurred in MPI_Recv" with more than 2 CPU
From: vasilis (gkanis_at_[hidden])
Date: 2009-05-28 03:26:11

> This is a problem of numerical stability, and there is no solution for
> such a problem in MPI. Usually, preconditioning the input matrix
> improve the numerical stability.

It could be a numerical stability but this would imply that I have an ill-
conditioned matrix. This is not my case.

> If you read the MPI standard, there is a __short__ section about what
> guarantees the MPI collective communications provide. There is only
> one: if you run the same collective twice, on the same set of nodes
> with the same input data, you will get the same output. In fact the
> main problem is that MPI consider all default operations (MPI_OP) as
> being commutative and associative, which is usually the case in real
> world but not when floating point rounding is around. When you
> increase the number of nodes, the data will be spread in smaller
> pieces, which means more operations will have to be done in order to
> achieve the reduction, i.e. more rounding errors might occur and so on.

You could have a point if I would see these small differences in both matrices.
I am solving the system Ax=b with the MUMPS libraries. The construction of the
matrix A and the matrix-column b is distributed among np CPU. The matrix A is
the same whether I use 2CPUs or np CPUs. The matrix b would slightly change if
I use more than 2CPUs.

My data are not spread in smaller pieces!! I am using the FEM to solve the
system of equations, and I use MPI to partition the domain. Therefore, the
data (i.e., the vector of unknowns) is the same in all the CPUs, and each CPU
is constructing a portion of the matrices A,b. Then, in the host CPU I add all
these pieces into A and b.

Thank you,

> Thanks,
> george.
> On May 27, 2009, at 11:16 , vasilis wrote:
> >> Rank 0 accumulates all the res_cpu values into a single array,
> >> res. It
> >> starts with its own res_cpu and then adds all other processes. When
> >> np=2, that means the order is prescribed. When np>2, the order is no
> >> longer prescribed and some floating-point rounding variations can
> >> start
> >> to occur.
> >
> > Yes you are right. Now, the question is why would these floating-
> > point rounding
> > variations occur for np>2? It cannot be due to a not prescribed
> > order!!
> >
> >> If you want results to be more deterministic, you need to fix the
> >> order
> >> in which res is aggregated. E.g., instead of using MPI_ANY_SOURCE,
> >> loop
> >> over the peer processes in a specific order.
> >>
> >> P.S. It seems to me that you could use MPI collective operations to
> >> implement what you're doing. E.g., something like:
> >
> > I could use these operations for the res variable (Will it make the
> > summation
> > any faster?). But, I can not use them for the other 3 variables.
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> >
> _______________________________________________
> users mailing list
> users_at_[hidden]