Following up on this, I have partial resolution. The primary culprit
appears to be stale files in a ramdisk non-uniformly distributed across
the sockets, thus interactingly poorly with NUMA. The slow runs
invariably have high numa_miss and numa_foreign counts. I still have
trouble making it explain up to a factor of 10 degredation, but it
certainly explains a factor of 3.