Hi Folks,
I have a run on 256 PEs onot a lustre file system with the following code:
[snip]
integer :: mype,npe,pe_min,pe_max,pe_prev,pe_next,mpi_my_real, &
comm=mpi_comm_world,status(mpi_status_size),error, &
mpi_realsize, thefile
integer (kind=MPI_OFFSET_KIND) disp
logical :: pe0,prl
! *************************************************************************
call mpi_init(error)
call mpi_comm_rank(comm,mype,error)
call mpi_comm_size(comm, npe,error)
call mpi_type_extent(mpi_real, mpi_realsize, error);
call mpi_type_size(MPI_REAL8, mpi_realsize, error)
pe0=mype==0
.
.
.
disp = mype*lu*mpi_realsize
call mpi_barrier(comm,error)
call mpi_file_open(comm,'output-parallel/dump.dat',
MPI_MODE_RDONLY, mpi_info_null, thefile, error)
call mpi_file_write_at(thefile, disp, u(1,nx1,ny1,nz1), lu,
MPI_REAL8, mpi_status_ignore, error)
call mpi_file_close(thefile, error)
call mpi_barrier(comm,error)
[snip]
where lu is an integer which does not extend the limit. If I am
exceeding the 32 Bit limit, which means that the size of my output file
is larger then 2**31 but (what rouhgly 2.4 Gbytes), I am getting only a
file with a size of 327 MBytey instead of expected 181 GByte for a
checkpoint. This leads of course to a segfault when restarting. I am
afraid this has something to do with the 32 Bit limit of my filesize,
which might be calculated wrong in my offset (which is disp in my code)
in mpi_file_write_at.
Any ideas on how I can enclose the reson of the errpr, or - even better
- on how to solve it?
Best wishes
Alexander
|