Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] openMPI 1.4.2: mpi_write fails on NFSv3 crossmounts
From: Oliver Deppert (Oliver.Deppert_at_[hidden])
Date: 2010-08-24 09:03:36


General informations:
------------------------------------
3 node Opteron cluster, 24CPUs, Melanox Infiniband 10Gb interconnect
Debian Lenny 5.0
self build kernel from kernel.org: 2.6.32.12, all NFS functions
available from kernel side
self build NFS-utils 1.2.2 from debian source of sid: nfs-kernel-server,
nfs-common

nfs-server with working lockd
fnctl() and locking is available on all nfs-clients, tested with
perl-script (attached)

openMPI 1.4.2 (build with GNU 4.3.2)
configure options:
./configure --prefix=/opt/openMPI_gnu_4.3.2 --sysconfdir=/etc
--localstatedir=/var --with-libnuma=/usr --with-libnuma-libdir=/usr/lib
--enable-mpirun-prefix-by-default --enable-sparse-groups --enable-static
--enable-cxx-exceptions --with-wrapper-cflags='-O3 -march=opteron'
--with-wrapper-cxxflags='-O3 -march=opteron' --with-wrapper-fflags='-O3
-march=opteron' --with-wrapper-fcflags='-O3 -march=opteron'
--with-openib --with-gnu-ld CFLAGS='-O3 -march=opteron' CXXFLAGS='-O3
-march=opteron' FFLAGS='-O3 -march=opteron' FCFLAGS='-O3 -march=opteron'

=======================================================================================

Dear openMPI developers,

I've found a bug in the current stable release of openMPI 1.4.2 which is
related to the MPI_WRITE function in combination with the execution on a
NFS-v3-crossmount. I've attached a small Fortran code-snip (testmpi.f),
which uses mpi_write to create a file "test.dat" which contains
{1,2,3,4,5,6} in binary, MPI_REALS written from every mpi-node executed
on, in the right displacement to every node.

When I execute this code on a glusterFS share, everthing works like a
charme....no problems at all....

The Problem is, when I try to compile and execute this program for two
nodes on an NFS-crossmount with openMPI, I get the following MPI error:
[ppclus02:23440] *** An error occurred in MPI_Bcast
[ppclus02:23440] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
[ppclus02:23440] *** MPI_ERR_TRUNCATE: message truncated
[ppclus02:23440] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpiexec has exited due to process rank 1 with PID 23440 on
node 192.168.11.2 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------

My first educated guess was, that my NFS-crossmounts aren't capable to
make use of fnct() to lock the file needed by MPI_WRITE. So, i gave a
try on the following perl script (lock.pl). The result was: fnctl() and
NFS-file-locking works...

In comparison, I also tried the recent unstable version of MPICH2 v1.3a2
on the same NFS-crossmount. With MPICH2 it works also without any
problems on NFS-v3.

Thanks for your help, I remain in

best regards,
Oliver Deppert

lock.pl (to test NFS fnctl()-file locking)
-----------------------------------------------------------------------------------------------------------------------------------------------------

#!/usr/bin/perl
  use Fcntl;
  open FH, ">locktest.lock" or die "Cannot open $fn: $!";
  print "Testing fcntl...\n";
  @list = (F_WRLCK,0,0,0,0); # exclusive write lock, entire file
  $struct = pack("SSLLL",@list);
  fcntl(FH,&F_SETLKW,$struct) or die("cannot lock because: $!\n");

------------------------------------------------------------------------------------------------------------------------------------------------------

testmpi.f (fortran 90 code-snip to test mpi_write on NFS-v3)
-----------------------------------------------------------------------------------------------------------------------------------------------------
       program WRITE_FILE

       implicit none
       include 'mpif.h'

       integer info,pec
       integer npe,mpe,mtag

       integer :: realsize,file,displace,displaceloc
       integer(kind=MPI_OFFSET_KIND) :: disp
       integer :: status(MPI_STATUS_SIZE)
       real(kind=4) :: locidx(6)

c INITIALIZATION

       call MPI_INIT(info)
       call MPI_COMM_SIZE(MPI_COMM_WORLD,npe,info) call
MPI_COMM_RANK(MPI_COMM_WORLD,mpe,info)

c routine

       mtag=123
       displace=6

       !send data offset
       do pec=0,mpe-1
          CALL MPI_SEND(displace,1,MPI_INTEGER,
& pec,mtag,MPI_COMM_WORLD,info)
       enddo
       do pec=mpe+1,npe-1
          CALL MPI_SEND(displace,1,MPI_INTEGER,
& pec,mtag,MPI_COMM_WORLD,info)
       enddo

       displaceloc=0
       !get data offset
       do pec=0,mpe-1
          CALL MPI_RECV(displace,1,MPI_INTEGER,pec,mtag,
& MPI_COMM_WORLD,status,info)

          displaceloc=displaceloc+displace
       enddo

       CALL MPI_TYPE_EXTENT(MPI_REAL,realsize,info)
       disp=displaceloc*realsize

       !open file
       CALL MPI_FILE_OPEN(MPI_COMM_WORLD,'test.dat',
& MPI_MODE_WRONLY+MPI_MODE_CREATE,MPI_INFO_NULL,file,info)

       !set file view (displacement in bytes)
       CALL MPI_FILE_SET_VIEW(file,disp,MPI_REAL,
& MPI_REAL,'native',MPI_INFO_NULL,info)

       !write out data
       locidx(1)=1
       locidx(2)=2
       locidx(3)=3
       locidx(4)=4
       locidx(5)=5
       locidx(6)=6

       CALL MPI_FILE_WRITE(file,locidx,6,MPI_REAL,
& status,info)

       !wait until all processes are done
       !sync-barrier-sync recommended by mpi-consortium to guarantee
       !file consistency
       !http://www.mpi-forum.org/docs/mpi-20-html/node215.htm (2010)
       call MPI_FILE_SYNC(file,info)
       call MPI_BARRIER(MPI_COMM_WORLD,info)
       CALL MPI_FILE_SYNC(file,info)
       !close file
       call MPI_FILE_CLOSE(file,info)

       call MPI_FINALIZE(info)
       stop

       end

------------------------------------------------------------------------------------------------------------------------------------------------------