My supercomputer has OpenMPI 1.4. I am running into a frustrating
problem with my MPI program. I am using only the following calls,
which I expect to be blocking:
Somehow I am getting this error when I do a large number of sequential
communications: "c002:2.0.Exhausted 1048576 MQ irecv request
descriptors, which usually indicates a user program error or
insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)"
This seems counter-intuitive to me because I don't think I should be
using irecvs since I am wanting specifically to rely on the documented
blocking behavior of MPI_Recv (not MPI_Irecv, which I am not using).
My main program is quite large, however I have managed to replicate
the irritating behavior in this much smaller program, which executes a
number of MPI_Send or MPI_Recv calls in a loop. The program's default
behaviour is to run 2,000,000 iterations. When I turn it up to
20,000,000, after a short time it generates the PSM_MQ_RECVREQS_MAX
I would appreciate if anyone could advise why it might be happening in
this "test" case -- basically what is going on that causes my
presumably blocking MPI_Recv calls to "accumulate" such a large number
of "irecv request descriptors", when I expect they should be blocking
and get immediately resolved and the count should go down when the
matching MPI_Send is posted.
I appreciate your assistance. Thank you!
Research Assistant, U. Oklahoma
- application/octet-stream attachment: crash.c