Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] nonblocking MPI_File_iwrite() does block?
From: Christoph Rackwitz (rackwitz_at_[hidden])
Date: 2009-11-23 05:17:45

Rob Latham wrote:
> The standard in no way requires any overlap for either the nonblocking
> communication or I/O routines. There are long and heated discussions
> about "strict" or "weak" interpretation of the progress rule and which
> one is "better".

Unfortunate. But with your "official" statement, I can now put that
issue behind me. Thanks :)

> If you want asynchronous nonblocking I/O, you might have to roll all
> the way back to LAM or MPICH-1.2.7, when ROMIO used its own request
> objects and test/wait routines on top of the aio routines.

> What if you moved your MPI_File_write call into a thread? There are
> several ways to do this: you could use standard generalized reqeusts
> and make progress with a thread -- the
> application writer has a lot more knowledge about the systems and how
> best to allocate threads.

The funny thing is that my code was supposed to be an instructive demo
of MPI's asynchronous I/O APIs and their functionality. It's basically
number crunching on a matrix, distributed in stripes across ranks. The
parallel I/O would have all ranks write their stripe into one file, done
with a subarray data type.

Adding any kind of threading would be practical and performing better,
but not showing off MPI's I/O APIs. I'd rather keep the code as simple
as it is, so people see the "other" benefits of MPI's APIs: they're
higher-level, more convenient than rolling it by hand.

> If I may ask a slightly different question: you've got periods of I/O
> and periods of computation. Have you evaluated collective I/O?

I thought about it and I know a way to make it happen too, but I put
that on the "to do" pile for possible improvements later on, after I'd
have gotten the asynchronous I/O working. My file format contains a
struct followed by two matrices (same dimensions). Right now, I write
the header via rank 0 and then each rank writes one stripe for each
matrix, resulting in two Requests pending. I gather that I'd need to
construct one or two more data types for split-collective I/O to be
applicable, i.e., so the whole write happens in one call.

> I
> know you are eager to hide I/O in the background -- to get it for free
> -- but there's no such thing as a free lunch. Background I/O might
> still perturb your computation phase, unless you make zero MPI calls
> in your computational phase. Collective I/O can bring some fairly
> powerful optimizations to the table and reduce your overall I/O costs,
> perhaps even reducing them enough that you no longer miss true
> asynchronous I/O ?

I'll give that a try then.