Hi!
The following code shows a bad behaviour when running over openib.
Openmpi: 1.3.3
With openib it dies with "error polling HP CQ with status WORK REQUEST
FLUSHED ERROR status number 5 ", with tcp or shmem it works as expected.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
int rank;
int n;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
fprintf(stderr, "I am %d at %d\n", rank, time(NULL));
fflush(stderr);
n = 4;
MPI_Bcast(&n, 1, MPI_INTEGER, 0, MPI_COMM_WORLD);
fprintf(stderr, "I am %d at %d\n", rank, time(NULL));
fflush(stderr);
if (rank == 0) {
sleep(60);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize( );
exit(0);
}
I know about the internal openmpi reason for it do behave as it does.
But i think that it should be allowed to behave as it does.
This example is a bit engineered but there are codes where a similar
situation can occur, i.e. the Bcast sender doing lots of other work
after the Bcast before the next MPI call. VASP is a candidate for this.
--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake_at_[hidden] Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
|