Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to avoid abort when calling MPI_Finalize without calling MPI_File_close?
From: James Overfelt (overfelt_at_[hidden])
Date: 2010-12-01 13:00:19


On Wed, Dec 1, 2010 at 8:28 AM, Rob Latham <robl_at_[hidden]> wrote:
> On Mon, Nov 22, 2010 at 04:40:14PM -0700, James Overfelt wrote:
>> Hello,
>>
>>     I have a small test case where a file created with MPI_File_open
>> is still open at the time MPI_Finalize is called.  In the actual
>> program there are lots of open files and it would be nice to avoid the
>> resulting "Your MPI job will now abort." by either having MPI_Finalize
>> close the files or honor the error handler and return an error code
>> without an abort.
>>
>>   I've tried with with OpenMPI 1.4.3 and 1.5 with the same results.
>> Attached are the configure, compile and source files and the whole
>> program follows.
>
> under MPICH2, this simple test program does not abort.  You leak a lot
> of resources (e.g. info structure allocated is not freed) but it
> sounds like you are well aware of that.
>
> under openmpi, this test program fails because openmpi is trying to
> help you out.  I'm going to need some help from the openmpi folks
> here, but the backtrace makes it look like MPI_Finalize is setting the
> "no more mpi calls allowed" flag, and then goes and calls some mpi
> routines to clean up the opened files:
>
> Breakpoint 1, 0xb7f7c346 in PMPI_Barrier () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> (gdb) where
> #0  0xb7f7c346 in PMPI_Barrier () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #1  0xb78a4c25 in mca_io_romio_dist_MPI_File_close () from /home/robl/work/soft/openmpi-1.4/lib/openmpi/mca_io_romio.so
> #2  0xb787e8b3 in mca_io_romio_file_close () from /home/robl/work/soft/openmpi-1.4/lib/openmpi/mca_io_romio.so
> #3  0xb7f591b1 in file_destructor () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #4  0xb7f58f28 in ompi_file_finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #5  0xb7f67eb3 in ompi_mpi_finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #6  0xb7f82828 in PMPI_Finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #7  0x0804f9c2 in main (argc=1, argv=0xbfffed94) at file_error.cc:17
>
> Why is there an MPI_Barrier in the close path?  It has to do with our
> implementation of shared file pointers.  If you run this test on a file system
> that does not support shared file pointers ( PVFS, for example), you might get
> a little further.
>
> So, I think the ball is back in the OpenMPI court: they have to
> re-jigger the order of the destructors so that closing files comes a
> little earlier in the shutdown process.
>
> ==rob
>

Rob,

  Thank you, that is the answer I was hoping for: I'm not crazy and
it should be an easy fix. I'll look through the OpenMPI source code
and maybe suggest a fix.

jro