On Tue, 2006-08-15 at 14:24 -0700, Tom Rosmond wrote:
> I am continuing to test the MPI-2 features of 1.1, and have run into
> some puzzling behavior. I wrote a simple F90 program to test 'mpi_put'
> and 'mpi_get' on a coordinate transformation problem on a two dual-core
> processor Opteron workstation running the PGI 6.1 compiler. The program
> runs correctly for a variety of problem sizes and processor counts.
> However, my main interest is a large global weather prediction model
> that has been running in production with 1-sided message passing on an
> SGI Origin 3000 for several years. This code does not run with OMPI
> 1-sided message passing. I have investigated the difference between this
> code and the test program and noticed a critical difference. Both
> programs call 'mpi_win_create' to create an integer 'handle' to the RMA
> window used by 'mpi_put' and 'mpi_get'. In the test program this
> 'handle' returns with a value of '1', but in the large code the 'handle'
> returns with value '0'. Subsequent synchronization calls to
> 'mpi_win_fence' succeed in the small program (error status eq 0), while
> in the large code they fail (error status ne 0), and the transfers fail
> also (no data is passed).
> Do you have any suggestions on what could cause this difference in
> behavior between the two codes, specifically why the 'handles' have
> different values? Are there any diagnostics I could produce that would
> provide information?
The difference in handle values is irrelevant to the failures you are
seeing. Our handle 0 is MPI_WIN_NULL, so you should never see that
returned from MPI_WIN_CREATE.
Unfortunately, when I wrote the one-sided implementation, I didn't add
useful debugging messages the user can enable. I can add some and make
a tarball, if you would be willing to give it a try. What error
messages are coming out of the large code?
By the way, just to make sure your expectations are set correctly, Open
MPI's one-sided performance in v1.1 and v1.2 is bad, as it's implemented
over the point-to-point engine. You're not going to get Origin-like
performance out of the current implementation.