_______________________________________________I am running single-node Sandy Bridge cases with OpenMPI and looking at scaling.
I’m using –bind-to-core without any other options (default is –bycore I believe).
These numbers indicate number of cores first, then the second digit is the run number (except for n=1, all runs repeated 3 times). Any thought why n15 should be so much slower than n16? I also measure the RSS of the running processes, and the rank 0 process for n=15 cases uses about 2x more memory than all the other ranks, whereas all the ranks use the same amount of memory for the n=16 cases.
Thanks for insights,
n1.1: 6.9530n2.1: 7.0185n2.2: 7.0313n3.1: 8.2069n3.2: 8.1628n3.3: 8.1311n4.1: 7.5307n4.2: 7.5323n4.3: 7.5858n5.1: 9.5693n5.2: 9.5104n5.3: 9.4821n6.1: 8.9821n6.2: 8.9720n6.3: 8.9541n7.1: 10.640n7.2: 10.650n7.3: 10.638n8.1: 8.6822n8.2: 8.6630n8.3: 8.6903n9.1: 9.5058n9.2: 9.5255n9.3: 9.4809n10.1: 10.484n10.2: 10.452n10.3: 10.516n11.1: 11.327n11.2: 11.316n11.3: 11.318n12.1: 12.285n12.2: 12.303n12.3: 12.272n13.1: 13.127n13.2: 13.113n13.3: 13.113n14.1: 14.035n14.2: 13.989n14.3: 14.021n15.1: 14.533n15.2: 14.529n15.3: 14.586n16.1: 8.6542n16.2: 8.6731n16.3: 8.6586~
users mailing list