I was running a bunch of np=4 test programs over two nodes.
Occasionally, *one* of the codes would see an IBV_EVENT_QP_ACCESS_ERR
during MPI_Finalize(). I traced the code and ran another program that
mimicked the particular MPI calls made by that program. This other
program, too, would occasionally trigger this error. I never saw the
problem with other tests. Rate of incidence could go from consecutive
runs (I saw this once) to 1:100s (more typically) to even less
frequently -- I've had 1000s of consecutive runs with no problems. (The
tests run a few seconds apiece.) The traffic pattern is sends from
non-zero ranks to rank 0, with root-0 gathers, and lots of Allgathers.
The largest messages are 1000bytes. It appears the problem is always
seen on rank 3.
Now, I wouldn't mind someone telling me, based on that little
information, what the problem is here, but I guess I don't expect that.
What I am asking is what IBV_EVENT_QP_ACCESS_ERR means. Again, it's
seen during MPI_Finalize. The async thread is seeing this. What is
this error trying to tell me?