Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with MPI-IO at filesizes greater then the 32 Bit limit...
From: pascal.deveze_at_[hidden]
Date: 2011-09-05 03:43:30


Hi,

I am not sure I understand what you are doing.

users-bounces_at_[hidden] a écrit sur 03/09/2011 11:05:04 :

> De : alibeck <alexander.beck-ratzka_at_[hidden]>
> A : Open MPI Users <users_at_[hidden]>
> Date : 03/09/2011 11:05
> Objet : [OMPI users] problem with MPI-IO at filesizes greater then
> the 32 Bit limit...
> Envoyé par : users-bounces_at_[hidden]
>
> Hi Folks,
>
> I have a run on 256 PEs onot a lustre file system with the following
code:
>
> [snip]
> integer :: mype,npe,pe_min,pe_max,pe_prev,pe_next,mpi_my_real, &
> comm=mpi_comm_world,status(mpi_status_size),error, &
> mpi_realsize, thefile
> integer (kind=MPI_OFFSET_KIND) disp
>
> logical :: pe0,prl
>
>
> !
*************************************************************************
>
> call mpi_init(error)
> call mpi_comm_rank(comm,mype,error)
> call mpi_comm_size(comm, npe,error)
>
> call mpi_type_extent(mpi_real, mpi_realsize, error);

mpi_type_extent is deprecated, use mpi_type_get_extent instead.
(this will not solve your problem).

> call mpi_type_size(MPI_REAL8, mpi_realsize, error)
>
> pe0=mype==0
>
> .
> .
> .
> disp = mype*lu*mpi_realsize

So, disp=0

>
> call mpi_barrier(comm,error)
> call mpi_file_open(comm,'output-parallel/dump.dat',
> MPI_MODE_RDONLY, mpi_info_null, thefile, error)

You open the file with a "read only" flag

> call mpi_file_write_at(thefile, disp, u(1,nx1,ny1,nz1), lu,
> MPI_REAL8, mpi_status_ignore, error)

You write in your "read only" file ...

> call mpi_file_close(thefile, error)
> call mpi_barrier(comm,error)
>
>
> [snip]
>
> where lu is an integer which does not extend the limit. If I am
> exceeding the 32 Bit limit, which means that the size of my output file
> is larger then 2**31 but (what rouhgly 2.4 Gbytes), I am getting only a
> file with a size of 327 MBytey instead of expected 181 GByte for a
> checkpoint. This leads of course to a segfault when restarting. I am
> afraid this has something to do with the 32 Bit limit of my filesize,
> which might be calculated wrong in my offset (which is disp in my code)
> in mpi_file_write_at.

I do not understand why you are expecting a size of 181 Gbytes.

If lu becomes negative (this is the case when the 2**31 limit is reached),
the mpi_file_write_at routine should return an error. Do you check the
returned error ?

>
> Any ideas on how I can enclose the reson of the errpr, or - even better
> - on how to solve it?
>

With the information you send, it is difficult to give you a solution. I
can advise you to
simplify as far as possible your application (less than 100 lines wil be
OK), reproduce the
problem and send us the reproducer. I then could try to reproduce your
problem on my side.

> Best wishes
>
> Alexander
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users