On Dec 4, 2009, at 6:54 PM, Nicolas Bock wrote:
> in our code we use a very short front-end program to drive a larger set of codes that do our calculations. Right in the beginning of the front-end, we have an if() statement such that only the rank 0 front-end does something, and the other ranks go right away to an MPI_Barrier() statement, waiting for the rank 0 front-end to finish. The rank 0 front-end then goes ahead and does its thing by calling the other codes with MPI_Comm_spawn().
> We noticed that the rank > 0 copies of the front-end consume a lot of CPU while they are waiting at the MPI_Barrier(). This is obviously not what we had intended. From previous discussion on this list I understand that the CPU consumption stems from the aggressive polling frequency of the MPI_Barrier() function. While I understand that there are a lot of situations where a high polling frequency in MPI_Barrier() is useful, the situation we are in is not one of them.
> Is our master and slave programming model such an unusual way of using MPI?
Define "unusual". :-)
MPI applications tend to be home-grown and very specific to individual problems that they are trying to solve. *Most* MPI applications are trying to get the lowest latency (perhaps it's more accurate to say "the most discussed/cited MPI applications"). As such, we designed Open MPI to get that lowest latency -- and that's by polling. :-\ I can't speak for other MPI implementations, but I believe that most of them will do similar things.
Some users don't necessarily want rock-bottom latency; they want features like what you want (e.g., true blocking in MPI blocking functions). At the moment, Open MPI doesn't cater to those requirements. We've had a lot of discussions over the years on this specific feature and we have some pretty good ideas how to do it. But the priority of this hasn't really ever percolated up high enough for someone to start actually working on it. :-\
So I wouldn't characterize your application as "unusual" -- *every* MPI app is unusual. I would say that your requirement for not consuming CPU while in a blocking MPI function is in the minority.
There were a few suggestions posted earlier today on how to be creative and get around these implementation artifacts -- perhaps they might be useful to you...?