Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] openMPI 1.4.2: mpi_write fails on NFSv3 crossmounts
From: Oliver Deppert (Oliver.Deppert_at_[hidden])
Date: 2010-08-24 09:03:36

General informations:
3 node Opteron cluster, 24CPUs, Melanox Infiniband 10Gb interconnect
Debian Lenny 5.0
self build kernel from, all NFS functions
available from kernel side
self build NFS-utils 1.2.2 from debian source of sid: nfs-kernel-server,

nfs-server with working lockd
fnctl() and locking is available on all nfs-clients, tested with
perl-script (attached)

openMPI 1.4.2 (build with GNU 4.3.2)
configure options:
./configure --prefix=/opt/openMPI_gnu_4.3.2 --sysconfdir=/etc
--localstatedir=/var --with-libnuma=/usr --with-libnuma-libdir=/usr/lib
--enable-mpirun-prefix-by-default --enable-sparse-groups --enable-static
--enable-cxx-exceptions --with-wrapper-cflags='-O3 -march=opteron'
--with-wrapper-cxxflags='-O3 -march=opteron' --with-wrapper-fflags='-O3
-march=opteron' --with-wrapper-fcflags='-O3 -march=opteron'
--with-openib --with-gnu-ld CFLAGS='-O3 -march=opteron' CXXFLAGS='-O3
-march=opteron' FFLAGS='-O3 -march=opteron' FCFLAGS='-O3 -march=opteron'


Dear openMPI developers,

I've found a bug in the current stable release of openMPI 1.4.2 which is
related to the MPI_WRITE function in combination with the execution on a
NFS-v3-crossmount. I've attached a small Fortran code-snip (testmpi.f),
which uses mpi_write to create a file "test.dat" which contains
{1,2,3,4,5,6} in binary, MPI_REALS written from every mpi-node executed
on, in the right displacement to every node.

When I execute this code on a glusterFS share, everthing works like a problems at all....

The Problem is, when I try to compile and execute this program for two
nodes on an NFS-crossmount with openMPI, I get the following MPI error:
[ppclus02:23440] *** An error occurred in MPI_Bcast
[ppclus02:23440] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
[ppclus02:23440] *** MPI_ERR_TRUNCATE: message truncated
[ppclus02:23440] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
mpiexec has exited due to process rank 1 with PID 23440 on
node exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).

My first educated guess was, that my NFS-crossmounts aren't capable to
make use of fnct() to lock the file needed by MPI_WRITE. So, i gave a
try on the following perl script ( The result was: fnctl() and
NFS-file-locking works...

In comparison, I also tried the recent unstable version of MPICH2 v1.3a2
on the same NFS-crossmount. With MPICH2 it works also without any
problems on NFS-v3.

Thanks for your help, I remain in

best regards,
Oliver Deppert (to test NFS fnctl()-file locking)

  use Fcntl;
  open FH, ">locktest.lock" or die "Cannot open $fn: $!";
  print "Testing fcntl...\n";
  @list = (F_WRLCK,0,0,0,0); # exclusive write lock, entire file
  $struct = pack("SSLLL",@list);
  fcntl(FH,&F_SETLKW,$struct) or die("cannot lock because: $!\n");


testmpi.f (fortran 90 code-snip to test mpi_write on NFS-v3)
       program WRITE_FILE

       implicit none
       include 'mpif.h'

       integer info,pec
       integer npe,mpe,mtag

       integer :: realsize,file,displace,displaceloc
       integer(kind=MPI_OFFSET_KIND) :: disp
       integer :: status(MPI_STATUS_SIZE)
       real(kind=4) :: locidx(6)


       call MPI_INIT(info)
       call MPI_COMM_SIZE(MPI_COMM_WORLD,npe,info) call

c routine


       !send data offset
       do pec=0,mpe-1
          CALL MPI_SEND(displace,1,MPI_INTEGER,
& pec,mtag,MPI_COMM_WORLD,info)
       do pec=mpe+1,npe-1
          CALL MPI_SEND(displace,1,MPI_INTEGER,
& pec,mtag,MPI_COMM_WORLD,info)

       !get data offset
       do pec=0,mpe-1
          CALL MPI_RECV(displace,1,MPI_INTEGER,pec,mtag,
& MPI_COMM_WORLD,status,info)


       CALL MPI_TYPE_EXTENT(MPI_REAL,realsize,info)

       !open file

       !set file view (displacement in bytes)
& MPI_REAL,'native',MPI_INFO_NULL,info)

       !write out data

       CALL MPI_FILE_WRITE(file,locidx,6,MPI_REAL,
& status,info)

       !wait until all processes are done
       !sync-barrier-sync recommended by mpi-consortium to guarantee
       !file consistency
       ! (2010)
       call MPI_FILE_SYNC(file,info)
       call MPI_BARRIER(MPI_COMM_WORLD,info)
       CALL MPI_FILE_SYNC(file,info)
       !close file
       call MPI_FILE_CLOSE(file,info)

       call MPI_FINALIZE(info)