Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] How to avoid abort when calling MPI_Finalize without calling MPI_File_close?
From: James Overfelt (overfelt_at_[hidden])
Date: 2010-12-01 13:00:19


On Wed, Dec 1, 2010 at 8:28 AM, Rob Latham <robl_at_[hidden]> wrote:
> On Mon, Nov 22, 2010 at 04:40:14PM -0700, James Overfelt wrote:
>> Hello,
>>
>>     I have a small test case where a file created with MPI_File_open
>> is still open at the time MPI_Finalize is called.  In the actual
>> program there are lots of open files and it would be nice to avoid the
>> resulting "Your MPI job will now abort." by either having MPI_Finalize
>> close the files or honor the error handler and return an error code
>> without an abort.
>>
>>   I've tried with with OpenMPI 1.4.3 and 1.5 with the same results.
>> Attached are the configure, compile and source files and the whole
>> program follows.
>
> under MPICH2, this simple test program does not abort.  You leak a lot
> of resources (e.g. info structure allocated is not freed) but it
> sounds like you are well aware of that.
>
> under openmpi, this test program fails because openmpi is trying to
> help you out.  I'm going to need some help from the openmpi folks
> here, but the backtrace makes it look like MPI_Finalize is setting the
> "no more mpi calls allowed" flag, and then goes and calls some mpi
> routines to clean up the opened files:
>
> Breakpoint 1, 0xb7f7c346 in PMPI_Barrier () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> (gdb) where
> #0  0xb7f7c346 in PMPI_Barrier () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #1  0xb78a4c25 in mca_io_romio_dist_MPI_File_close () from /home/robl/work/soft/openmpi-1.4/lib/openmpi/mca_io_romio.so
> #2  0xb787e8b3 in mca_io_romio_file_close () from /home/robl/work/soft/openmpi-1.4/lib/openmpi/mca_io_romio.so
> #3  0xb7f591b1 in file_destructor () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #4  0xb7f58f28 in ompi_file_finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #5  0xb7f67eb3 in ompi_mpi_finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #6  0xb7f82828 in PMPI_Finalize () from /home/robl/work/soft/openmpi-1.4/lib/libmpi.so.0
> #7  0x0804f9c2 in main (argc=1, argv=0xbfffed94) at file_error.cc:17
>
> Why is there an MPI_Barrier in the close path?  It has to do with our
> implementation of shared file pointers.  If you run this test on a file system
> that does not support shared file pointers ( PVFS, for example), you might get
> a little further.
>
> So, I think the ball is back in the OpenMPI court: they have to
> re-jigger the order of the destructors so that closing files comes a
> little earlier in the shutdown process.
>
> ==rob
>

Rob,

  Thank you, that is the answer I was hoping for: I'm not crazy and
it should be an easy fix. I'll look through the OpenMPI source code
and maybe suggest a fix.

jro