Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Campbell (tim.campbell_at_[hidden])
Date: 2007-01-10 17:19:51


Greetings,

Attached is a small test fortran program that triggers a failure in
the mpi_waitall. The problem is that the after a couple of calls to
mpi_startall and mpi_waitall some of the mpi_requests become
corrupted. This causes the next call to mpi_startall to fail. Here
is output from a 2 cpu run.

[44]% mpif90 -g test_ompi.f
[45]% mpirun -np 2 a.out
TEST(A): 0 1 | 2 3 4 5
TEST(B): 0 1 | 2 3 4 5
OUTPUT: 0 1 | 100 100 101 101
TEST(A): 0 2 | 2 3 4 5
TEST(B): 0 2 | -32766 -32766 4 5
OUTPUT: 0 2 | 200 200 201 201
TEST(A): 1 1 | 2 3 4 5
TEST(B): 1 1 | 2 3 4 5
OUTPUT: 1 1 | 101 101 100 100
TEST(A): 1 2 | 2 3 4 5
TEST(B): 1 2 | -32766 -32766 4 5
OUTPUT: 1 2 | 201 201 200 200
^Cmpirun: killing job...

The "-32766" values show up in the mpi_request array after the second
call to mpi_waitall. Using prints in the OpenMPI code I have tracked
the problem to

ompi/request/req_wait.c:ompi_request_wait_all().

I find upon entry to ompi_request_wait_all() that the values of
request[:]->req_f_to_c_index are valid. However, upon exit of
ompi_request_wait_all() the first two entries of request[:]-
>req_f_to_c_index have the value of -32766.

I am testing with OpenMPI version 1.2b2. This problem occurs on both
x86_64 and Intel i386 and it occurs for both Portland Group compilers
and for GCC/G95.

Cheers,
Tim Campbell
Naval Research Laboratory