Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Issues with MPI_Type_create_darray
From: Antonio Molins (amolins_at_[hidden])
Date: 2008-10-30 17:21:01


Hi all,

I am having some trouble with this function. I want to map data to a
2x2 block-cyclic configuration in C, using the code:

        MPI_Barrier(blacs_comm);
        // size of each matrix
        int *array_of_gsizes = new int[2];
        array_of_gsizes[0]=this->nx;
        array_of_gsizes[1]=this->ny;
        // block-cyclic distritution used by ScaLAPACK
        int *array_of_distrs = new int[2];
        array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
        array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
        int *array_of_dargs = new int[2];
        array_of_dargs[0]=BLOCK_SIZE;
        array_of_dargs[1]=BLOCK_SIZE;
        int *array_of_psizes = new int[2];
        array_of_psizes[0]=Pr;
        array_of_psizes[1]=Pc;
        int rank = pc+pr*Pc;
        MPI_Type_create_darray(Pr*Pc,rank,
2,array_of_gsizes,array_of_distrs,array_of_dargs,
                                                        array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
        MPI_Type_commit(&this->datatype);
        int typesize;
        long typeextent;
        MPI_Type_size(this->datatype,&typesize);
        MPI_Type_extent(this->datatype,&typeextent);
        printf("type size for process rank (%d,%d) is %d doubles, type extent
is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),(int)
(typeextent/sizeof(double)),nx*ny);
        MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,
MPI_INFO_NULL, &this->fid);
        MPI_File_set_view(this->fid,this->offset
+i*nx*ny*sizeof(double),MPI_DOUBLE,this-
>datatype,"native",MPI_INFO_NULL);
        

This works well when used like this, but problem is that the matrix
itself is written in disk column-major fashion, so I would want to use
the code as if I was reading it transposed, that is:

        MPI_Barrier(blacs_comm);
        // size of each matrix
        int *array_of_gsizes = new int[2];
        array_of_gsizes[0]=this->ny;
        array_of_gsizes[1]=this->nx;
        // block-cyclic distritution used by ScaLAPACK
        int *array_of_distrs = new int[2];
        array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
        array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
        int *array_of_dargs = new int[2];
        array_of_dargs[0]=BLOCK_SIZE;
        array_of_dargs[1]=BLOCK_SIZE;
        int *array_of_psizes = new int[2];
        array_of_psizes[0]=Pr;
        array_of_psizes[1]=Pc;
        int rank = pr+pc*Pr;
        MPI_Type_create_darray(Pr*Pc,rank,
2,array_of_gsizes,array_of_distrs,array_of_dargs,
                                                        array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
        MPI_Type_commit(&this->datatype);
        MPI_Type_size(this->datatype,&typesize);
        MPI_Type_extent(this->datatype,&typeextent);
        printf("type size for process rank (%d,%d) is %d doubles, type extent
is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),(int)
(typeextent/sizeof(double)),nx*ny);
        MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,
MPI_INFO_NULL, &this->fid);
        MPI_File_set_view(this->fid,this->offset
+i*nx*ny*sizeof(double),MPI_DOUBLE,this-
>datatype,"native",MPI_INFO_NULL);

To my surprise, this code crashes while calling MPI_File_set_view()!!!
And before you ask, I did try switching MPI_ORDER_C to
MPI_ORDER_FORTRAN, I got the same results I am reporting here.

Also, I am quite intrigued by the text output of each of these
programs: the first one will report:

        type size for process rank (0,0) is 32 doubles, type extent is 91
doubles (up to 91).
        type size for process rank (1,0) is 20 doubles, type extent is 119
doubles (up to 91).
        type size for process rank (0,1) is 24 doubles, type extent is 95
doubles (up to 91).
        type size for process rank (1,1) is 15 doubles, type extent is 123
doubles (up to 91).

Anybody know why the extents are not equal???

Even weirder, the second one will report:

        type size for process rank (0,0) is 32 doubles, type extent is 91
doubles (up to 91).
        type size for process rank (1,0) is 20 doubles, type extent is 95
doubles (up to 91).
        type size for process rank (0,1) is 24 doubles, type extent is 143
doubles (up to 91).
        type size for process rank (1,1) is 15 doubles, type extent is 147
doubles (up to 91).

The extent changed! I think this is somehow related to the posterior
crash of MPI_File_set_view(), but that's as far as I can understand...

Any clue about what is happening? I attach the trace below.

Best,
A

--------------------------------------------------------------------------------
Antonio Molins, PhD Candidate
Medical Engineering and Medical Physics
Harvard - MIT Division of Health Sciences and Technology

--
"When a traveler reaches a fork in the road,
the â„“1 -norm tells him to take either one way or the other,
but the â„“2 -norm instructs him to head off into the bushes. "
			John F. Claerbout and Francis Muir, 1973
--------------------------------------------------------------------------------
*** glibc detected *** double free or corruption (!prev):  
0x0000000000cf4130 ***
[login4:26709] *** Process received signal ***
[login4:26708] *** Process received signal ***
[login4:26708] Signal: Aborted (6)
[login4:26708] Signal code:  (-6)
[login4:26709] Signal: Segmentation fault (11)
[login4:26709] Signal code: Address not mapped (1)
[login4:26709] Failing at address: 0x18
[login4:26708] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
[login4:26708] [ 1] /lib64/tls/libc.so.6(gsignal+0x3d) [0x36fe62e26d]
[login4:26708] [ 2] /lib64/tls/libc.so.6(abort+0xfe) [0x36fe62fa6e]
[login4:26708] [ 3] /lib64/tls/libc.so.6 [0x36fe6635f1]
[login4:26708] [ 4] /lib64/tls/libc.so.6 [0x36fe6691fe]
[login4:26708] [ 5] /lib64/tls/libc.so.6(__libc_free+0x76)  
[0x36fe669596]
[login4:26708] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc4ae]
[login4:26708] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
[login4:26708] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_Type_free+0x5b) [0x2a962f654f]
[login4:26708] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten+0x1804) [0x2aa4603612]
[login4:26708] [10] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
[login4:26708] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
[login4:26708] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd)  
[0x2aa46088a9]
[login4:26708] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so [0x2aa45ec288]
[login4:26708] [14] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_File_set_view+0x53) [0x2a963002ff]
[login4:26708] [15] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix 
+0xc3) [0x42a50b]
[login4:26708] [16] ./bin/test2(main+0xc2e) [0x43014a]
[login4:26708] [17] /lib64/tls/libc.so.6(__libc_start_main+0xdb)  
[0x36fe61c40b]
[login4:26708] [18] ./bin/test2(_ZNSt8ios_base4InitD1Ev+0x42) [0x41563a]
[login4:26708] *** End of error message ***
[login4:26709] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
[login4:26709] [ 1] /lib64/tls/libc.so.6 [0x36fe66882b]
[login4:26709] [ 2] /lib64/tls/libc.so.6 [0x36fe668f8d]
[login4:26709] [ 3] /lib64/tls/libc.so.6(__libc_free+0x76)  
[0x36fe669596]
[login4:26709] [ 4] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc4ae]
[login4:26709] [ 5] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_release_args+0x93) [0x2a962d5641]
[login4:26709] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc514]
[login4:26709] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_release_args+0x93) [0x2a962d5641]
[login4:26709] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0  
[0x2a962cc514]
[login4:26709] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
[login4:26709] [10] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_Type_free+0x5b) [0x2a962f654f]
[login4:26709] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten+0x147) [0x2aa4601f55]
[login4:26709] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten+0x1569) [0x2aa4603377]
[login4:26709] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
[login4:26709] [14] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
[login4:26709] [15] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd)  
[0x2aa46088a9]
[login4:26709] [16] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/ 
mca_io_romio.so [0x2aa45ec288]
[login4:26709] [17] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so. 
0(MPI_File_set_view+0x53) [0x2a963002ff]
[login4:26709] [18] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix 
+0xc3) [0x42a50b]
[login4:26709] [19] ./bin/test2(main+0xc2e) [0x43014a]
[login4:26709] [20] /lib64/tls/libc.so.6(__libc_start_main+0xdb)  
[0x36fe61c40b]
[login4:26709] [21] ./bin/test2(_ZNSt8ios_base4InitD1Ev+0x42) [0x41563a]
[login4:26709] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 26708 on node  
login4.ranger.tacc.utexas.edu exited on signal 6 (Aborted).
--------------------------------------------------------------------------