Hi there,
I have attached a little piece of code which summarizes a "bug?" that
annoys me ultimately. Issuing various calls to MPI_WIN_LOCK/UNLOCK seems
to hang some processes until a MPI_BARRIER is encountered!??
My experience with MPI is very modest, so I apologize in advance if I
misread the MPI-2 specs, but it looks that what I want to do is correct.
If you look to the file hangs.F90; the code starts with various call to
LOCK/UNLOCK and then goes on with, let's say, a big piece of work, in
between the comment " start action" and "action done". For the purpose
of this example, that's a do loop of 10s.
I don't want to put a barrier after the various calls to LOCK/UNLOCK
because I want it to run asynchronously. Also notice that I don't need
some mutex or so, all that calls can be done simultaneously and in any
order. My only pb is the following hangs:
Here the output when the code run on a SMP machine (8 cores) by
increasing the number of processus (the same occurs with distributed
memory).
mpirun -np 1 ./hangs
start action for rank= 0
(10 secondes later)
action done for rank= 0
<----works as I expect.
mpirun -np 2 ./hangs
start action for rank= 1
start action for rank= 0
(10 secs later)
action done for rank= 1
action done for rank= 0
<----so far so good; but with more processus the "bug?" appears:
mpirun -np 3 ./hangs
start action for rank= 1
start action for rank= 0
(10 secs later)
action done for rank= 0
action done for rank= 1
start action for rank= 2
(10 secs later)
action done for rank= 2
The processus 2 remained stuck on the MPI_UNLOCK statement until 0 and 1
reached the MPI_BARRIER instruction; which actually renders the
execution serial :)
I tested with up to 8 processes and the problem becomes even worse; a
random number of processes are stuck on the MPI_UNLOCK. However, this
does not occur at each execution. Sometime, rarely though, all the
processes get released as expected from the UNLOCK.
Additionally, if a MPI_BARRIER is issued just after the MPI_UNLOCK,
there is no problem any more; but I never read in the MPI-2 specs that
it should be the case, and this would completely kills the interest of
performing asynchronous operations.
gcc/gfortran is 4.6.3
(Open MPI) 1.4.5
Please let me know if this behaviour can be fixed and if you need
additional information!
Thanks in advance,
Cheers,
Chris.
|