Hi Terry,

How does the stack for the non-SM BTL run look, I assume it probably is the same?  Also, can you dump the message queues for rank 1?  What's interesting is you have a bunch of pending receives, do you expect that to be the case when the MPI_Gatherv occurred?

It turns out we have an unbalanced MPI_Bcast buried very deep in the application. After fixing that bug, the application behaves correctly.
Thank you all for the help, and sorry for the false alarm.