It is polling at the barrier. This is done aggressively by default for performance. You can tell it to be less aggressive if you want via the yield_when_idle mca param.