I am running single-node Sandy Bridge cases with OpenMPI and looking at scaling.

 

I’m using –bind-to-core without any other options (default is –bycore I believe).

 

These numbers indicate number of cores first, then the second digit is the run number (except for n=1, all runs repeated 3 times).  Any thought why n15 should be so much slower than n16?   I also measure the RSS of the running processes, and the rank 0 process for n=15 cases uses about 2x more memory than all the other ranks, whereas all the ranks use the same amount of memory for the n=16 cases.

 

Thanks for insights,

 

Ed

 

n1.1:    6.9530   

n2.1:    7.0185   

n2.2:    7.0313   

n3.1:    8.2069

n3.2:    8.1628   

n3.3:    8.1311   

n4.1:    7.5307   

n4.2:    7.5323   

n4.3:    7.5858   

n5.1:    9.5693   

n5.2:    9.5104   

n5.3:    9.4821   

n6.1:    8.9821   

n6.2:    8.9720   

n6.3:    8.9541   

n7.1:    10.640   

n7.2:    10.650   

n7.3:    10.638   

n8.1:    8.6822   

n8.2:    8.6630   

n8.3:    8.6903   

n9.1:    9.5058   

n9.2:    9.5255   

n9.3:    9.4809   

n10.1:    10.484    

n10.2:    10.452    

n10.3:    10.516    

n11.1:    11.327    

n11.2:    11.316    

n11.3:    11.318    

n12.1:    12.285    

n12.2:    12.303    

n12.3:    12.272    

n13.1:    13.127    

n13.2:    13.113    

n13.3:    13.113    

n14.1:    14.035    

n14.2:    13.989    

n14.3:    14.021    

n15.1:    14.533    

n15.2:    14.529    

n15.3:    14.586    

n16.1:    8.6542    

n16.2:    8.6731    

n16.3:    8.6586    

~