On Sun, Aug 22, 2010 at 9:57 PM, Randolph Pullen <
randolph_pullen_at_[hidden]> wrote:
> Its a long shot but could it be related to the total data volume ?
> ie 524288 * 80 = 41943040 bytes active in the cluster
>
> Can you exceed this 41943040 data volume with a smaller message repeated
> more often or a larger one less often?
>
Not so far, so your diagnosis could be right. The failures have been at the
following data volumes:
41.9E6
4.1E6
8.2E6
Unfortunately, I'm not sure I can change the repeat rate with the OFED/MPI
tests. Can I do that? Didn't see a suitable flag.
In any case, assuming it is related to the total data volume what could be
causing such a failure?
--
Rahul
|