Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] "An error occurred in MPI_Recv" with more than 2 CPU
From: vasilis (gkanis_at_[hidden])
Date: 2009-05-29 03:12:59


> The original issue, still reflected by the subject heading of this e-mail,
> was that a message overran its receive buffer. That was fixed by using
> tags to distinguish different kinds of messages (res, jacob, row, and col).
>
> I thought the next problem was the small (10^-10) variations in results
> when np>2. In my mind, a plausible explanation for this is that you're
> adding the "res_cpu" contributions from all the various processes to the
> "res" array in some arbitrary order. The contribution from rank 0 is added
> in first, but all the others come in in some nondeterministic order. Since
> you're using finite-precision arithmetic, this can lead to tiny round-off
> variations.
>
> If you want to get rid of those minor variations, you have to perform
> floating-point arithmetic in a particular order.

Unfortunately it did not work. I replaced the "MPI_ANY_SOURCE" with "JW" but
I did not see any difference.