Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] bug in MPI_File_set_view?
From: Rob Latham (robl_at_[hidden])
Date: 2014-05-19 12:23:32


On 05/15/2014 08:32 AM, Edgar Gabriel wrote:
> could you try just for curiosity to force to use OMPIO? e.g.
> mpirun --mca io ompio ....

Edgar, what is in the air that there are now three bug reports against
ROMIO's flattening code in the last month?

We've fixed this upstream in ROMIO by ignoring zero-length blocks, but
George Bosilca suggested Open-MPI's fix for that might have been too
aggressive.

For those of you not on the mpich-discuss list, we've determined that
whatever problem MPICH had with Oriol Canela-Xandri's test case has been
fixed in the latest from-git versions.

OMPIO uses OpenMPI's datatype processing, so if they both handle
zero-length blocks the same way, everything's fine. ROMIO processes
datatypes internally (providing a third implementation of MPI datatype
processing. sigh.). If there's a disagreement about how to handle
these special cases, memory errors such as you report can happen.

==rob

>
> Thanks
> Edgar
>
> On 5/15/2014 3:56 AM, CANELA-XANDRI Oriol wrote:
>> Hi, I installed and tried with version 1.8.1 but it also fails. I see the error when there are some processes without any matrix block. It's not a common situation, but this makes me feel unsure about I am not doing something wrong. The error I get is:
>>
>> *** Error in `./binary': free(): invalid size: 0x0000000000a34c00 ***
>> [oriol-VirtualBox:13975] *** Process received signal ***
>> [oriol-VirtualBox:13975] Signal: Aborted (6)
>> [oriol-VirtualBox:13975] Signal code: (-6)
>> [oriol-VirtualBox:13969] *** Process received signal ***
>> [oriol-VirtualBox:13969] Signal: Aborted (6)
>> [oriol-VirtualBox:13969] Signal code: (-6)
>> ======= Backtrace: =========
>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
>> [oriol-VirtualBox:13969] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f06a50a7ff0]
>> [oriol-VirtualBox:13969] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f06a50a7f77]
>> [oriol-VirtualBox:13969] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f06a50ab5e8]
>> [oriol-VirtualBox:13969] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f06a50e54fb]
>> [oriol-VirtualBox:13969] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f06a50f1996]
>> [oriol-VirtualBox:13969] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f0691e12c02]
>> [oriol-VirtualBox:13969] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f0691df7189]
>> [oriol-VirtualBox:13969] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f0691de9dd8]
>> [oriol-VirtualBox:13969] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f06a5ea02c6]
>> [oriol-VirtualBox:13969] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f06a5ea0811]
>> [oriol-VirtualBox:13969] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f06a5edc118]
>> [oriol-VirtualBox:13969] [11] ./binary[0x42099e]
>> [oriol-VirtualBox:13969] [12] ./binary[0x48ed86]
>> [oriol-VirtualBox:13969] [13] ./binary[0x40e49f]
>> [oriol-VirtualBox:13969] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f06a5092de5]
>> [oriol-VirtualBox:13969] [15] ./binary[0x40d679]
>> [oriol-VirtualBox:13969] *** End of error message ***
>> [oriol-VirtualBox:13975] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f1857201ff0]
>> [oriol-VirtualBox:13975] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1857201f77]
>> [oriol-VirtualBox:13975] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f18572055e8]
>> [oriol-VirtualBox:13975] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f185723f4fb]
>> [oriol-VirtualBox:13975] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f185724b996]
>> [oriol-VirtualBox:13975] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f18459d2c02]
>> [oriol-VirtualBox:13975] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f18459b7189]
>> [oriol-VirtualBox:13975] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f18459a9dd8]
>> [oriol-VirtualBox:13975] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f1857ffa2c6]
>> [oriol-VirtualBox:13975] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f1857ffa811]
>> [oriol-VirtualBox:13975] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f1858036118]
>> [oriol-VirtualBox:13975] [11] ./binary[0x42099e]
>> [oriol-VirtualBox:13975] [12] ./binary[0x48ed86]
>> [oriol-VirtualBox:13975] [13] ./binary[0x40e49f]
>> [oriol-VirtualBox:13975] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f18571ecde5]
>> [oriol-VirtualBox:13975] [15] ./binary[0x40d679]
>> [oriol-VirtualBox:13975] *** End of error message ***
>> [oriol-VirtualBox:13972] *** Process received signal ***
>> [oriol-VirtualBox:13972] Signal: Aborted (6)
>> [oriol-VirtualBox:13972] Signal code: (-6)
>> [oriol-VirtualBox:13972] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f5844a43ff0]
>> [oriol-VirtualBox:13972] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f5844a43f77]
>> [oriol-VirtualBox:13972] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f5844a475e8]
>> [oriol-VirtualBox:13972] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f5844a814fb]
>> [oriol-VirtualBox:13972] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
>> [oriol-VirtualBox:13972] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f58315f2c02]
>> [oriol-VirtualBox:13972] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f58315d7189]
>> [oriol-VirtualBox:13972] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8]
>> [oriol-VirtualBox:13972] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6]
>> [oriol-VirtualBox:13972] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811]
>> [oriol-VirtualBox:13972] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118]
>> [oriol-VirtualBox:13972] [11] ./binary[0x42099e]
>> [oriol-VirtualBox:13972] [12] ./binary[0x48ed86]
>> [oriol-VirtualBox:13972] [13] ./binary[0x40e49f]
>> [oriol-VirtualBox:13972] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5]
>> [oriol-VirtualBox:13972] [15] ./binary[0x40d679]
>> [oriol-VirtualBox:13972] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 2 with PID 13969 on node oriol-VirtualBox exited on signal 6 (Aborted).
>> --------------------------------------------------------------------------
>>
>>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA