Hi again,

Using MPI_Type_get_true_extent(), I changed the way of reporting type size and extent to:

int typesize;
long typeextent, typelb;
MPI_Type_size(this->datatype,&typesize);
MPI_Type_get_true_extent(this->datatype,&typelb,&typeextent);
//MPI_Type_lb(this->datatype,&typelb);
//MPI_Type_extent(this->datatype,&typeextent);
printf("\ntype size for process rank (%d,%d) is %d doubles, type extent is %d doubles (up to %d), range is [%d, %d].\n",pr,pc,typesize/(int)sizeof(double),(int)(typeextent/sizeof(double)),nx*ny,(int)(typelb/sizeof(double)),(int)((typelb+typeextent)/sizeof(double)));

Which now is giving me the correct answers for both situations. For the first one (works):

type size for process rank (1,0) is 20 doubles, type extent is 60 doubles (up to 91), range is [28, 88].
type size for process rank (0,0) is 32 doubles, type extent is 81 doubles (up to 91), range is [0, 81].
type size for process rank (0,1) is 24 doubles, type extent is 80 doubles (up to 91), range is [4, 84].
type size for process rank (1,1) is 15 doubles, type extent is 59 doubles (up to 91), range is [32, 91].

For the second one (before getting the same double free error with MPI_File_set_view):

type size for process rank (1,0) is 20 doubles, type extent is 48 doubles (up to 91), range is [4, 52].
type size for process rank (0,0) is 32 doubles, type extent is 51 doubles (up to 91), range is [0, 51].
type size for process rank (0,1) is 24 doubles, type extent is 38 doubles (up to 91), range is [52, 90].
type size for process rank (1,1) is 15 doubles, type extent is 35 doubles (up to 91), range is [56, 91].

Can anybody give me a hint here? Is there a bug in MPI_Type_create_darray I should be aware of?

Best,
A

On Oct 30, 2008, at 5:21 PM, Antonio Molins wrote:

Hi all,

I am having some trouble with this function. I want to map data to a 2x2 block-cyclic configuration in C, using the code:

MPI_Barrier(blacs_comm);
// size of each matrix
int *array_of_gsizes = new int[2];
array_of_gsizes[0]=this->nx;
array_of_gsizes[1]=this->ny;
// block-cyclic distritution used by ScaLAPACK
int *array_of_distrs = new int[2];
array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
int *array_of_dargs = new int[2];
array_of_dargs[0]=BLOCK_SIZE;
array_of_dargs[1]=BLOCK_SIZE;
int *array_of_psizes = new int[2];
array_of_psizes[0]=Pr;
array_of_psizes[1]=Pc;
int rank = pc+pr*Pc; 
MPI_Type_create_darray(Pr*Pc,rank,2,array_of_gsizes,array_of_distrs,array_of_dargs,
array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
MPI_Type_commit(&this->datatype);
int typesize;
long typeextent;
MPI_Type_size(this->datatype,&typesize);
MPI_Type_extent(this->datatype,&typeextent);
printf("type size for process rank (%d,%d) is %d doubles, type extent is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),(int)(typeextent/sizeof(double)),nx*ny);
MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR, MPI_INFO_NULL, &this->fid);
MPI_File_set_view(this->fid,this->offset+i*nx*ny*sizeof(double),MPI_DOUBLE,this->datatype,"native",MPI_INFO_NULL);


This works well when used like this, but problem is that the matrix itself is written in disk column-major fashion, so I would want to use the code as if I was reading it transposed, that is:

MPI_Barrier(blacs_comm);
// size of each matrix
int *array_of_gsizes = new int[2];
array_of_gsizes[0]=this->ny;
array_of_gsizes[1]=this->nx;
// block-cyclic distritution used by ScaLAPACK
int *array_of_distrs = new int[2];
array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
int *array_of_dargs = new int[2];
array_of_dargs[0]=BLOCK_SIZE;
array_of_dargs[1]=BLOCK_SIZE;
int *array_of_psizes = new int[2];
array_of_psizes[0]=Pr;
array_of_psizes[1]=Pc;
int rank = pr+pc*Pr; 
MPI_Type_create_darray(Pr*Pc,rank,2,array_of_gsizes,array_of_distrs,array_of_dargs,
array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
MPI_Type_commit(&this->datatype);
MPI_Type_size(this->datatype,&typesize);
MPI_Type_extent(this->datatype,&typeextent);
printf("type size for process rank (%d,%d) is %d doubles, type extent is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),(int)(typeextent/sizeof(double)),nx*ny);
MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR, MPI_INFO_NULL, &this->fid);
MPI_File_set_view(this->fid,this->offset+i*nx*ny*sizeof(double),MPI_DOUBLE,this->datatype,"native",MPI_INFO_NULL);

To my surprise, this code crashes while calling MPI_File_set_view()!!! And before you ask, I did try switching MPI_ORDER_C to MPI_ORDER_FORTRAN, I got the same results I am reporting here.

Also, I am quite intrigued by the text output of each of these programs: the first one will report:

type size for process rank (0,0) is 32 doubles, type extent is 91 doubles (up to 91).
type size for process rank (1,0) is 20 doubles, type extent is 119 doubles (up to 91).
type size for process rank (0,1) is 24 doubles, type extent is 95 doubles (up to 91).
type size for process rank (1,1) is 15 doubles, type extent is 123 doubles (up to 91).

Anybody know why the extents are not equal???

Even weirder, the second one will report:

type size for process rank (0,0) is 32 doubles, type extent is 91 doubles (up to 91).
type size for process rank (1,0) is 20 doubles, type extent is 95 doubles (up to 91).
type size for process rank (0,1) is 24 doubles, type extent is 143 doubles (up to 91).
type size for process rank (1,1) is 15 doubles, type extent is 147 doubles (up to 91).

The extent changed! I think this is somehow related to the posterior crash of MPI_File_set_view(), but that's as far as I can understand...

Any clue about what is happening? I attach the trace below.

Best,
A

--------------------------------------------------------------------------------
Antonio Molins, PhD Candidate
Medical Engineering and Medical Physics
Harvard - MIT Division of Health Sciences and Technology
--
"When a traveler reaches a fork in the road, 
the ℓ1 -norm tells him to take either one way or the other, 
but the ℓ2 -norm instructs him to head off into the bushes. "

John F. Claerbout and Francis Muir, 1973 
--------------------------------------------------------------------------------

*** glibc detected *** double free or corruption (!prev): 0x0000000000cf4130 ***
[login4:26709] *** Process received signal ***
[login4:26708] *** Process received signal ***
[login4:26708] Signal: Aborted (6)
[login4:26708] Signal code:  (-6)
[login4:26709] Signal: Segmentation fault (11)
[login4:26709] Signal code: Address not mapped (1)
[login4:26709] Failing at address: 0x18
[login4:26708] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
[login4:26708] [ 1] /lib64/tls/libc.so.6(gsignal+0x3d) [0x36fe62e26d]
[login4:26708] [ 2] /lib64/tls/libc.so.6(abort+0xfe) [0x36fe62fa6e]
[login4:26708] [ 3] /lib64/tls/libc.so.6 [0x36fe6635f1]
[login4:26708] [ 4] /lib64/tls/libc.so.6 [0x36fe6691fe]
[login4:26708] [ 5] /lib64/tls/libc.so.6(__libc_free+0x76) [0x36fe669596]
[login4:26708] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0 [0x2a962cc4ae]
[login4:26708] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
[login4:26708] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(MPI_Type_free+0x5b) [0x2a962f654f]
[login4:26708] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(ADIOI_Flatten+0x1804) [0x2aa4603612]
[login4:26708] [10] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
[login4:26708] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
[login4:26708] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd) [0x2aa46088a9]
[login4:26708] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so [0x2aa45ec288]
[login4:26708] [14] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(MPI_File_set_view+0x53) [0x2a963002ff]
[login4:26708] [15] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix+0xc3) [0x42a50b]
[login4:26708] [16] ./bin/test2(main+0xc2e) [0x43014a]
[login4:26708] [17] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x36fe61c40b]
[login4:26708] [18] ./bin/test2(_ZNSt8ios_base4InitD1Ev+0x42) [0x41563a]
[login4:26708] *** End of error message ***
[login4:26709] [ 0] /lib64/tls/libpthread.so.0 [0x36ff10c5b0]
[login4:26709] [ 1] /lib64/tls/libc.so.6 [0x36fe66882b]
[login4:26709] [ 2] /lib64/tls/libc.so.6 [0x36fe668f8d]
[login4:26709] [ 3] /lib64/tls/libc.so.6(__libc_free+0x76) [0x36fe669596]
[login4:26709] [ 4] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0 [0x2a962cc4ae]
[login4:26709] [ 5] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(ompi_ddt_release_args+0x93) [0x2a962d5641]
[login4:26709] [ 6] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0 [0x2a962cc514]
[login4:26709] [ 7] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(ompi_ddt_release_args+0x93) [0x2a962d5641]
[login4:26709] [ 8] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0 [0x2a962cc514]
[login4:26709] [ 9] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(ompi_ddt_destroy+0x65) [0x2a962cd31d]
[login4:26709] [10] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(MPI_Type_free+0x5b) [0x2a962f654f]
[login4:26709] [11] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(ADIOI_Flatten+0x147) [0x2aa4601f55]
[login4:26709] [12] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(ADIOI_Flatten+0x1569) [0x2aa4603377]
[login4:26709] [13] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(ADIOI_Flatten_datatype+0xe7) [0x2aa46017fd]
[login4:26709] [14] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(ADIO_Set_view+0x14f) [0x2aa45ecb57]
[login4:26709] [15] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x1dd) [0x2aa46088a9]
[login4:26709] [16] /opt/apps/intel10_1/openmpi/1.3/lib/openmpi/mca_io_romio.so [0x2aa45ec288]
[login4:26709] [17] /opt/apps/intel10_1/openmpi/1.3/lib/libmpi.so.0(MPI_File_set_view+0x53) [0x2a963002ff]
[login4:26709] [18] ./bin/test2(_ZN14pMatCollection3getEiP7pMatrix+0xc3) [0x42a50b]
[login4:26709] [19] ./bin/test2(main+0xc2e) [0x43014a]
[login4:26709] [20] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x36fe61c40b]
[login4:26709] [21] ./bin/test2(_ZNSt8ios_base4InitD1Ev+0x42) [0x41563a]
[login4:26709] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 26708 on node login4.ranger.tacc.utexas.edu exited on signal 6 (Aborted).
--------------------------------------------------------------------------




_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--------------------------------------------------------------------------------
Antonio Molins, PhD Candidate
Medical Engineering and Medical Physics
Harvard - MIT Division of Health Sciences and Technology
--
"Y así del poco dormir y del mucho leer,
se le secó el cerebro de manera que vino
a perder el juicio".
                                       Miguel de Cervantes
--------------------------------------------------------------------------------