Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Help tracing casue of readv errors
From: Ashley Pittman (ashley_at_[hidden])
Date: 2009-11-25 06:41:55


On Wed, 2009-11-25 at 12:36 +0100, Atle Rudshaug wrote:

> I got a similar error when using non-blocking communication on large
> datasets. I could not figure out why this was happening, since it seemed
> sort of random. I eventually bypassed the problem by switching to
> blocking communication, which felt kind of sad...If anyone knows if this
> is a bug in OpenMPI or connected to hardware somehow, please share.

You could easily be running out of memory on one node by saturating it
with messages, all of which may need to be buffered. Have you checked
the offending nodes for messages from the OOM killer?

Ashley,

-- 
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk