Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] bug in MPI_File_set_view?
From: Rob Latham (robl_at_[hidden])
Date: 2014-05-19 12:23:32


On 05/15/2014 08:32 AM, Edgar Gabriel wrote:
> could you try just for curiosity to force to use OMPIO? e.g.
> mpirun --mca io ompio ....

Edgar, what is in the air that there are now three bug reports against
ROMIO's flattening code in the last month?

We've fixed this upstream in ROMIO by ignoring zero-length blocks, but
George Bosilca suggested Open-MPI's fix for that might have been too
aggressive.

For those of you not on the mpich-discuss list, we've determined that
whatever problem MPICH had with Oriol Canela-Xandri's test case has been
fixed in the latest from-git versions.

OMPIO uses OpenMPI's datatype processing, so if they both handle
zero-length blocks the same way, everything's fine. ROMIO processes
datatypes internally (providing a third implementation of MPI datatype
processing. sigh.). If there's a disagreement about how to handle
these special cases, memory errors such as you report can happen.

==rob

>
> Thanks
> Edgar
>
> On 5/15/2014 3:56 AM, CANELA-XANDRI Oriol wrote:
>> Hi, I installed and tried with version 1.8.1 but it also fails. I see the error when there are some processes without any matrix block. It's not a common situation, but this makes me feel unsure about I am not doing something wrong. The error I get is:
>>
>> *** Error in `./binary': free(): invalid size: 0x0000000000a34c00 ***
>> [oriol-VirtualBox:13975] *** Process received signal ***
>> [oriol-VirtualBox:13975] Signal: Aborted (6)
>> [oriol-VirtualBox:13975] Signal code: (-6)
>> [oriol-VirtualBox:13969] *** Process received signal ***
>> [oriol-VirtualBox:13969] Signal: Aborted (6)
>> [oriol-VirtualBox:13969] Signal code: (-6)
>> ======= Backtrace: =========
>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
>> [oriol-VirtualBox:13969] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f06a50a7ff0]
>> [oriol-VirtualBox:13969] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f06a50a7f77]
>> [oriol-VirtualBox:13969] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f06a50ab5e8]
>> [oriol-VirtualBox:13969] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f06a50e54fb]
>> [oriol-VirtualBox:13969] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f06a50f1996]
>> [oriol-VirtualBox:13969] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f0691e12c02]
>> [oriol-VirtualBox:13969] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f0691df7189]
>> [oriol-VirtualBox:13969] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f0691de9dd8]
>> [oriol-VirtualBox:13969] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f06a5ea02c6]
>> [oriol-VirtualBox:13969] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f06a5ea0811]
>> [oriol-VirtualBox:13969] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f06a5edc118]
>> [oriol-VirtualBox:13969] [11] ./binary[0x42099e]
>> [oriol-VirtualBox:13969] [12] ./binary[0x48ed86]
>> [oriol-VirtualBox:13969] [13] ./binary[0x40e49f]
>> [oriol-VirtualBox:13969] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f06a5092de5]
>> [oriol-VirtualBox:13969] [15] ./binary[0x40d679]
>> [oriol-VirtualBox:13969] *** End of error message ***
>> [oriol-VirtualBox:13975] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f1857201ff0]
>> [oriol-VirtualBox:13975] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1857201f77]
>> [oriol-VirtualBox:13975] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f18572055e8]
>> [oriol-VirtualBox:13975] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f185723f4fb]
>> [oriol-VirtualBox:13975] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f185724b996]
>> [oriol-VirtualBox:13975] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f18459d2c02]
>> [oriol-VirtualBox:13975] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f18459b7189]
>> [oriol-VirtualBox:13975] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f18459a9dd8]
>> [oriol-VirtualBox:13975] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f1857ffa2c6]
>> [oriol-VirtualBox:13975] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f1857ffa811]
>> [oriol-VirtualBox:13975] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f1858036118]
>> [oriol-VirtualBox:13975] [11] ./binary[0x42099e]
>> [oriol-VirtualBox:13975] [12] ./binary[0x48ed86]
>> [oriol-VirtualBox:13975] [13] ./binary[0x40e49f]
>> [oriol-VirtualBox:13975] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f18571ecde5]
>> [oriol-VirtualBox:13975] [15] ./binary[0x40d679]
>> [oriol-VirtualBox:13975] *** End of error message ***
>> [oriol-VirtualBox:13972] *** Process received signal ***
>> [oriol-VirtualBox:13972] Signal: Aborted (6)
>> [oriol-VirtualBox:13972] Signal code: (-6)
>> [oriol-VirtualBox:13972] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f5844a43ff0]
>> [oriol-VirtualBox:13972] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f5844a43f77]
>> [oriol-VirtualBox:13972] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f5844a475e8]
>> [oriol-VirtualBox:13972] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f5844a814fb]
>> [oriol-VirtualBox:13972] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
>> [oriol-VirtualBox:13972] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f58315f2c02]
>> [oriol-VirtualBox:13972] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f58315d7189]
>> [oriol-VirtualBox:13972] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8]
>> [oriol-VirtualBox:13972] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6]
>> [oriol-VirtualBox:13972] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811]
>> [oriol-VirtualBox:13972] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118]
>> [oriol-VirtualBox:13972] [11] ./binary[0x42099e]
>> [oriol-VirtualBox:13972] [12] ./binary[0x48ed86]
>> [oriol-VirtualBox:13972] [13] ./binary[0x40e49f]
>> [oriol-VirtualBox:13972] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5]
>> [oriol-VirtualBox:13972] [15] ./binary[0x40d679]
>> [oriol-VirtualBox:13972] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 2 with PID 13969 on node oriol-VirtualBox exited on signal 6 (Aborted).
>> --------------------------------------------------------------------------
>>
>>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA