On Mon, Nov 22, 2010 at 04:40:14PM -0700, James Overfelt wrote:
> I have a small test case where a file created with MPI_File_open
> is still open at the time MPI_Finalize is called. In the actual
> program there are lots of open files and it would be nice to avoid the
> resulting "Your MPI job will now abort." by either having MPI_Finalize
> close the files or honor the error handler and return an error code
> without an abort.
> I've tried with with OpenMPI 1.4.3 and 1.5 with the same results.
> Attached are the configure, compile and source files and the whole
> program follows.
under MPICH2, this simple test program does not abort. You leak a lot
of resources (e.g. info structure allocated is not freed) but it
sounds like you are well aware of that.
under openmpi, this test program fails because openmpi is trying to
help you out. I'm going to need some help from the openmpi folks
here, but the backtrace makes it look like MPI_Finalize is setting the
"no more mpi calls allowed" flag, and then goes and calls some mpi
routines to clean up the opened files:
Breakpoint 1, 0xb7f7c346 in PMPI_Barrier () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
#0 0xb7f7c346 in PMPI_Barrier () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
#1 0xb78a4c25 in mca_io_romio_dist_MPI_File_close () from /home/robl/work/soft/openmpi-1.4/lib/openmpi/mca_io_romio.so
#2 0xb787e8b3 in mca_io_romio_file_close () from /home/robl/work/soft/openmpi-1.4/lib/openmpi/mca_io_romio.so
#3 0xb7f591b1 in file_destructor () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
#4 0xb7f58f28 in ompi_file_finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
#5 0xb7f67eb3 in ompi_mpi_finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
#6 0xb7f82828 in PMPI_Finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
#7 0x0804f9c2 in main (argc=1, argv=0xbfffed94) at file_error.cc:17
Why is there an MPI_Barrier in the close path? It has to do with our
implementation of shared file pointers. If you run this test on a file system
that does not support shared file pointers ( PVFS, for example), you might get
a little further.
So, I think the ball is back in the OpenMPI court: they have to
re-jigger the order of the destructors so that closing files comes a
little earlier in the shutdown process.
Mathematics and Computer Science Division
Argonne National Lab, IL USA