Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] bug in MPI_File_set_view?
From: CANELA-XANDRI Oriol (Oriol.CAnela-Xandri_at_[hidden])
Date: 2014-05-15 04:56:10


Hi, I installed and tried with version 1.8.1 but it also fails. I see the error when there are some processes without any matrix block. It's not a common situation, but this makes me feel unsure about I am not doing something wrong. The error I get is:

*** Error in `./binary': free(): invalid size: 0x0000000000a34c00 ***
[oriol-VirtualBox:13975] *** Process received signal ***
[oriol-VirtualBox:13975] Signal: Aborted (6)
[oriol-VirtualBox:13975] Signal code: (-6)
[oriol-VirtualBox:13969] *** Process received signal ***
[oriol-VirtualBox:13969] Signal: Aborted (6)
[oriol-VirtualBox:13969] Signal code: (-6)
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
[oriol-VirtualBox:13969] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f06a50a7ff0]
[oriol-VirtualBox:13969] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f06a50a7f77]
[oriol-VirtualBox:13969] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f06a50ab5e8]
[oriol-VirtualBox:13969] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f06a50e54fb]
[oriol-VirtualBox:13969] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f06a50f1996]
[oriol-VirtualBox:13969] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f0691e12c02]
[oriol-VirtualBox:13969] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f0691df7189]
[oriol-VirtualBox:13969] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f0691de9dd8]
[oriol-VirtualBox:13969] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f06a5ea02c6]
[oriol-VirtualBox:13969] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f06a5ea0811]
[oriol-VirtualBox:13969] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f06a5edc118]
[oriol-VirtualBox:13969] [11] ./binary[0x42099e]
[oriol-VirtualBox:13969] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13969] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13969] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f06a5092de5]
[oriol-VirtualBox:13969] [15] ./binary[0x40d679]
[oriol-VirtualBox:13969] *** End of error message ***
[oriol-VirtualBox:13975] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f1857201ff0]
[oriol-VirtualBox:13975] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1857201f77]
[oriol-VirtualBox:13975] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f18572055e8]
[oriol-VirtualBox:13975] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f185723f4fb]
[oriol-VirtualBox:13975] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f185724b996]
[oriol-VirtualBox:13975] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f18459d2c02]
[oriol-VirtualBox:13975] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f18459b7189]
[oriol-VirtualBox:13975] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f18459a9dd8]
[oriol-VirtualBox:13975] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f1857ffa2c6]
[oriol-VirtualBox:13975] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f1857ffa811]
[oriol-VirtualBox:13975] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f1858036118]
[oriol-VirtualBox:13975] [11] ./binary[0x42099e]
[oriol-VirtualBox:13975] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13975] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13975] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f18571ecde5]
[oriol-VirtualBox:13975] [15] ./binary[0x40d679]
[oriol-VirtualBox:13975] *** End of error message ***
[oriol-VirtualBox:13972] *** Process received signal ***
[oriol-VirtualBox:13972] Signal: Aborted (6)
[oriol-VirtualBox:13972] Signal code: (-6)
[oriol-VirtualBox:13972] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f5844a43ff0]
[oriol-VirtualBox:13972] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f5844a43f77]
[oriol-VirtualBox:13972] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f5844a475e8]
[oriol-VirtualBox:13972] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f5844a814fb]
[oriol-VirtualBox:13972] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
[oriol-VirtualBox:13972] [ 5] /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f58315f2c02]
[oriol-VirtualBox:13972] [ 6] /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f58315d7189]
[oriol-VirtualBox:13972] [ 7] /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8]
[oriol-VirtualBox:13972] [ 8] /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6]
[oriol-VirtualBox:13972] [ 9] /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811]
[oriol-VirtualBox:13972] [10] /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118]
[oriol-VirtualBox:13972] [11] ./binary[0x42099e]
[oriol-VirtualBox:13972] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13972] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13972] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5]
[oriol-VirtualBox:13972] [15] ./binary[0x40d679]
[oriol-VirtualBox:13972] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 13969 on node oriol-VirtualBox exited on signal 6 (Aborted).
--------------------------------------------------------------------------

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
Sent: 14 May 2014 16:24
To: Open MPI Users
Subject: Re: [OMPI users] bug in MPI_File_set_view?
Our initial thinking was first half of June, but that is subject to change depending on severity of reported errors. FWIW: I don't believe we made any romio changes between 1.8.1 and the current 1.8.2 state, so using 1.8.1 should be a valid test.
On May 14, 2014, at 8:16 AM, Bennet Fauber <bennet_at_[hidden]> wrote:
> Is there an ETA for 1.8.2 general release instead of snapshot?
> 
> Thanks,  -- bennet
> 
> On Wed, May 14, 2014 at 10:17 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> You might give it a try with 1.8.1 or the nightly snapshot from 1.8.2 
>> - we updated ROMIO since the 1.6 series, and whatever fix is required 
>> may be in the newer version
>> 
>> 
>> On May 14, 2014, at 6:52 AM, CANELA-XANDRI Oriol <Oriol.CAnela-Xandri_at_[hidden]> wrote:
>> 
>>> Hello,
>>> 
>>> I am using MPI IO for writing/reading  a block cyclic distribution matrix into a file.
>>> 
>>> It works fine except when there is some MPI threads with no data (i.e. when the matrix is small enough, or the block size is big enough that some processes in the grid do not have any matrix block). In this case, I receive an error when calling MPI_File_set_view saying that the data cannot be freed. I tried with 1.3 and 1.6 versions. When I try with MPICH it works without errors. Could this be a bug?
>>> 
>>> My function is (where nBlockRows/nBlockCols define the size of the blocks, nGlobRows/nGlobCols define the global size of the matrix, nProcRows/nProcCols define the dimensions of the process grid, and fname is the name of the file.):
>>> 
>>> void Matrix::writeMatrixMPI(std::string fname) { int dims[] = 
>>> {this->nGlobRows, this->nGlobCols}; int dargs[] = {this->nBlockRows, 
>>> this->nBlockCols}; int distribs[] = {MPI_DISTRIBUTE_CYCLIC, 
>>> MPI_DISTRIBUTE_CYCLIC}; int dim[] = {communicator->nProcRows, 
>>> communicator->nProcCols}; char nat[] = "native"; int rc; 
>>> MPI_Datatype dcarray; MPI_File cFile; MPI_Status status;
>>> 
>>> MPI_Type_create_darray(communicator->mpiNumTasks, 
>>> communicator->mpiRank, 2, dims, distribs, dargs, dim, 
>>> MPI_ORDER_FORTRAN, MPI_DOUBLE, &dcarray); MPI_Type_commit(&dcarray);
>>> 
>>> std::vector<char> fn(fname.begin(), fname.end()); 
>>> fn.push_back('\0'); rc = MPI_File_open(MPI_COMM_WORLD, &fn[0], 
>>> MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &cFile); if(rc){
>>>   std::stringstream ss;
>>>   ss << "Error: Failed to open file: " << rc;
>>>   misc.error(ss.str(), 0);
>>> }
>>> else
>>> {
>>>   MPI_File_set_view(cFile, 0, MPI_DOUBLE, dcarray, nat, MPI_INFO_NULL);
>>>   MPI_File_write_all(cFile, this->m, this->nRows*this->nCols, 
>>> MPI_DOUBLE, &status); } MPI_File_close(&cFile); 
>>> MPI_Type_free(&dcarray); }
>>> 
>>> Best regards,
>>> 
>>> Oriol
>>> 
>>> --
>>> The University of Edinburgh is a charitable body, registered in 
>>> Scotland, with registration number SC005336.
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users