Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Issues with MPI_Type_create_darray
From: Antonio Molins (amolins_at_[hidden])
Date: 2008-10-31 09:44:58


Hi again,

Using MPI_Type_get_true_extent(), I changed the way of reporting type
size and extent to:

        int typesize;
        long typeextent, typelb;
        MPI_Type_size(this->datatype,&typesize);
        MPI_Type_get_true_extent(this->datatype,&typelb,&typeextent);
        //MPI_Type_lb(this->datatype,&typelb);
        //MPI_Type_extent(this->datatype,&typeextent);
        printf("\ntype size for process rank (%d,%d) is %d doubles, type
extent is %d doubles (up to %d), range is [%d, %d].\n",pr,pc,typesize/
(int)sizeof(double),(int)(typeextent/sizeof(double)),nx*ny,(int)
(typelb/sizeof(double)),(int)((typelb+typeextent)/sizeof(double)));
        
Which now is giving me the correct answers for both situations. For
the first one (works):

        type size for process rank (1,0) is 20 doubles, type extent is 60
doubles (up to 91), range is [28, 88].
        type size for process rank (0,0) is 32 doubles, type extent is 81
doubles (up to 91), range is [0, 81].
        type size for process rank (0,1) is 24 doubles, type extent is 80
doubles (up to 91), range is [4, 84].
        type size for process rank (1,1) is 15 doubles, type extent is 59
doubles (up to 91), range is [32, 91].

For the second one (before getting the same double free error with
MPI_File_set_view):

        type size for process rank (1,0) is 20 doubles, type extent is 48
doubles (up to 91), range is [4, 52].
        type size for process rank (0,0) is 32 doubles, type extent is 51
doubles (up to 91), range is [0, 51].
        type size for process rank (0,1) is 24 doubles, type extent is 38
doubles (up to 91), range is [52, 90].
        type size for process rank (1,1) is 15 doubles, type extent is 35
doubles (up to 91), range is [56, 91].

Can anybody give me a hint here? Is there a bug in
MPI_Type_create_darray I should be aware of?

Best,
A

On Oct 30, 2008, at 5:21 PM, Antonio Molins wrote:

> Hi all,
>
> I am having some trouble with this function. I want to map data to a
> 2x2 block-cyclic configuration in C, using the code:
>
> MPI_Barrier(blacs_comm);
> // size of each matrix
> int *array_of_gsizes = new int[2];
> array_of_gsizes[0]=this->nx;
> array_of_gsizes[1]=this->ny;
> // block-cyclic distritution used by ScaLAPACK
> int *array_of_distrs = new int[2];
> array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
> array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
> int *array_of_dargs = new int[2];
> array_of_dargs[0]=BLOCK_SIZE;
> array_of_dargs[1]=BLOCK_SIZE;
> int *array_of_psizes = new int[2];
> array_of_psizes[0]=Pr;
> array_of_psizes[1]=Pc;
> int rank = pc+pr*Pc;
> MPI_Type_create_darray(Pr*Pc,rank,
> 2,array_of_gsizes,array_of_distrs,array_of_dargs,
> array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
> MPI_Type_commit(&this->datatype);
> int typesize;
> long typeextent;
> MPI_Type_size(this->datatype,&typesize);
> MPI_Type_extent(this->datatype,&typeextent);
> printf("type size for process rank (%d,%d) is %d doubles, type
> extent is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),
> (int)(typeextent/sizeof(double)),nx*ny);
> MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,
> MPI_INFO_NULL, &this->fid);
> MPI_File_set_view(this->fid,this->offset
> +i*nx*ny*sizeof(double),MPI_DOUBLE,this-
> >datatype,"native",MPI_INFO_NULL);
>
>
> This works well when used like this, but problem is that the matrix
> itself is written in disk column-major fashion, so I would want to
> use the code as if I was reading it transposed, that is:
>
> MPI_Barrier(blacs_comm);
> // size of each matrix
> int *array_of_gsizes = new int[2];
> array_of_gsizes[0]=this->ny;
> array_of_gsizes[1]=this->nx;
> // block-cyclic distritution used by ScaLAPACK
> int *array_of_distrs = new int[2];
> array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
> array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
> int *array_of_dargs = new int[2];
> array_of_dargs[0]=BLOCK_SIZE;
> array_of_dargs[1]=BLOCK_SIZE;
> int *array_of_psizes = new int[2];
> array_of_psizes[0]=Pr;
> array_of_psizes[1]=Pc;
> int rank = pr+pc*Pr;
> MPI_Type_create_darray(Pr*Pc,rank,
> 2,array_of_gsizes,array_of_distrs,array_of_dargs,
> array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
> MPI_Type_commit(&this->datatype);
> MPI_Type_size(this->datatype,&typesize);
> MPI_Type_extent(this->datatype,&typeextent);
> printf("type size for process rank (%d,%d) is %d doubles, type
> extent is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),
> (int)(typeextent/sizeof(double)),nx*ny);
> MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,
> MPI_INFO_NULL, &this->fid);
> MPI_File_set_view(this->fid,this->offset
> +i*nx*ny*sizeof(double),MPI_DOUBLE,this-
> >datatype,"native",MPI_INFO_NULL);
>
> To my surprise, this code crashes while calling
> MPI_File_set_view()!!! And before you ask, I did try switching
> MPI_ORDER_C to MPI_ORDER_FORTRAN, I got the same results I am
> reporting here.
>
> Also, I am quite intrigued by the text output of each of these
> programs: the first one will report:
>
> type size for process rank (0,0) is 32 doubles, type extent is 91
> doubles (up to 91).
> type size for process rank (1,0) is 20 doubles, type extent is 119
> doubles (up to 91).
> type size for process rank (0,1) is 24 doubles, type extent is 95
> doubles (up to 91).
> type size for process rank (1,1) is 15 doubles, type extent is 123
> doubles (up to 91).
>
> Anybody know why the extents are not equal???
>
> Even weirder, the second one will report:
>
> type size for process rank (0,0) is 32 doubles, type extent is 91
> doubles (up to 91).
> type size for process rank (1,0) is 20 doubles, type extent is 95
> doubles (up to 91).
> type size for process rank (0,1) is 24 doubles, type extent is 143
> doubles (up to 91).
> type size for process rank (1,1) is 15 doubles, type extent is 147
> doubles (up to 91).
>
> The extent changed! I think this is somehow related to the posterior
> crash of MPI_File_set_view(), but that's as far as I can understand...
>
> Any clue about what is happening? I attach the trace below.
>
> Best,
> A
>
> --------------------------------------------------------------------------------
> Antonio Molins, PhD Candidate
> Medical Engineering and Medical Physics
> Harvard - MIT Division of Health Sciences and Technology
> --
> "When a traveler reaches a fork in the road,
> the â„“1 -norm tells him to take either one way or the other,
> but the â„“2 -norm instructs him to head off into the bushes. "
>
> John F. Claerbout and Francis Muir, 1973
> --------------------------------------------------------------------------------
>
> *** glibc detected *** double free or corruption (!prev):
> 0x0000000000cf4130 ***
> [login4:26709] *** Process received signal ***
> [login4:26708] *** Process received signal ***
> [login4:26708] Signal: Aborted (6)
> [login4:26708] Signal code: (-6)
> [login4:26709] Signal: Segmentation fault (11)
> [login4:26709] Signal code: Address not mapped (1)
> [login4:26709] Failing at address: 0x18
> [login4:26708] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
> [login4:26708] [ 1] /lib64/tls/libc.so.6(gsignal+0x3d) [0x36fe62e26d]
> [login4:26708] [ 2] /lib64/tls/libc.so.6(abort+0xfe) [0x36fe62fa6e]
> [login4:26708] [ 3] /lib64/tls/libc.so.6 [0x36fe6635f1]
> [login4:26708] [ 4] /lib64/tls/libc.so.6 [0x36fe6691fe]
> [login4:26708] [ 5] /lib64/tls/libc.so.6(__libc_free+0x76)
> [0x36fe669596]
> [login4:26708] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0
> [0x2a962cc4ae]
> [login4:26708] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
> [login4:26708] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(MPI_Type_free+0x5b) [0x2a962f654f]
> [login4:26708] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(ADIOI_Flatten+0x1804) [0x2aa4603612]
> [login4:26708] [10] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
> [login4:26708] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
> [login4:26708] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd)
> [0x2aa46088a9]
> [login4:26708] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so [0x2aa45ec288]
> [login4:26708] [14] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(MPI_File_set_view+0x53) [0x2a963002ff]
> [login4:26708] [15] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix
> +0xc3) [0x42a50b]
> [login4:26708] [16] ./bin/test2(main+0xc2e) [0x43014a]
> [login4:26708] [17] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> [0x36fe61c40b]
> [login4:26708] [18] ./bin/test2(_ZNSt8ios_base4InitD1Ev+0x42)
> [0x41563a]
> [login4:26708] *** End of error message ***
> [login4:26709] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
> [login4:26709] [ 1] /lib64/tls/libc.so.6 [0x36fe66882b]
> [login4:26709] [ 2] /lib64/tls/libc.so.6 [0x36fe668f8d]
> [login4:26709] [ 3] /lib64/tls/libc.so.6(__libc_free+0x76)
> [0x36fe669596]
> [login4:26709] [ 4] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0
> [0x2a962cc4ae]
> [login4:26709] [ 5] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(ompi_ddt_release_args+0x93) [0x2a962d5641]
> [login4:26709] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0
> [0x2a962cc514]
> [login4:26709] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(ompi_ddt_release_args+0x93) [0x2a962d5641]
> [login4:26709] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0
> [0x2a962cc514]
> [login4:26709] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
> [login4:26709] [10] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(MPI_Type_free+0x5b) [0x2a962f654f]
> [login4:26709] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(ADIOI_Flatten+0x147) [0x2aa4601f55]
> [login4:26709] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(ADIOI_Flatten+0x1569) [0x2aa4603377]
> [login4:26709] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
> [login4:26709] [14] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
> [login4:26709] [15] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd)
> [0x2aa46088a9]
> [login4:26709] [16] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/
> mca_io_romio.so [0x2aa45ec288]
> [login4:26709] [17] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.
> 0(MPI_File_set_view+0x53) [0x2a963002ff]
> [login4:26709] [18] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix
> +0xc3) [0x42a50b]
> [login4:26709] [19] ./bin/test2(main+0xc2e) [0x43014a]
> [login4:26709] [20] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> [0x36fe61c40b]
> [login4:26709] [21] ./bin/test2(_ZNSt8ios_base4InitD1Ev+0x42)
> [0x41563a]
> [login4:26709] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 2 with PID 26708 on node
> login4.ranger.tacc.utexas.edu exited on signal 6 (Aborted).
> --------------------------------------------------------------------------
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--------------------------------------------------------------------------------
Antonio Molins, PhD Candidate
Medical Engineering and Medical Physics
Harvard - MIT Division of Health Sciences and Technology

--
"Y así del poco dormir y del mucho leer,
se le secó el cerebro de manera que vino
a perder el juicio".
                                        Miguel de Cervantes
--------------------------------------------------------------------------------