Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Deadlock in MPI_File_write_all on Infiniband
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2009-10-12 13:09:52

I am wondering whether this is really due to the usage of
File_write_all. We had a bug in in 1.3 series so far (which will be
fixed in 1.3.4) where we lost message segments and thus had a deadlock
in Comm_dup if there was communication occurring *right after* the
Comm_dup. File_open executes a comm_dup internally.

If you replace write_all by write, you are avoiding the communication.
If you replace ib by tcp, your entire timing is different and you might
accidentally not see the deadlock...

Just my $0.02 ...


Dorian Krause wrote:
> Dear list,
> the attached program deadlocks in MPI_File_write_all when run with 16
> processes on two 8 core nodes of an Infiniband cluster. It runs fine when I
> a) use tcp
> or
> b) replace MPI_File_write_all by MPI_File_write
> I'm using openmpi V. 1.3.2 (but I checked that the problem is also
> occurs with version 1.3.3). The OFED version is 1.4 (installed via
> Rocks). The Operating system is CentOS 5.2
> I compile with gcc-4.1.2. The openmpi configure flags are
> ../../configure --prefix=/share/apps/openmpi/1.3.2/gcc-4.1.2/
> --with-io-romio-flags=--with-file-system=nfs+ufs+pvfs2
> --with-wrapper-ldflags=-L/share/apps/pvfs2/lib
> CPPFLAGS=-I/share/apps/pvfs2/include/ LDFLAGS=-L/share/apps/pvfs2/lib
> LIBS=-lpvfs2 -lpthread
> The user home directories are mounted via nfs.
> Is it a problem with the user code, the system or with openmpi?
> Thanks,
> Dorian
> ------------------------------------------------------------------------
> _______________________________________________
> users mailing list
> users_at_[hidden]

Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335