Hello.
As I promised, I send you results about different simulations and parameters according
to the MPI options :TEST | DESCRIPTIONÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â SHARINGÂ Â Â Â Â Â Â Â Â Â Â Â Â Â MPI | WITH PBS | ELAPSE TIME 1ST ITERATION
1        Node 2                                                                    12 process             yes   no                0.21875E+03
2        Node 1                                                                    12 process             yes   no                0.21957E+03
3         Node 1, with 24 process to test multithreadin            24 process             yes   no                0.20613E+03
4         Node 2                                                                    12 process              yes   yes               0.22130E+03
5         Node 2, with 24 process to test multithreadin            24 process             yes   no                0.27300E+03
6
7         Nodes 1, 2                                                               2 x 6 process        yes   yes               0.17304E+03
8         Nodes 1, 2                                                               2 x 11 process       yes   yes               0.12395E+03
9         Nodes 1, 2                                                               2 x 12 process       yes   yes               0.11812E+03
10       Nodes 3, 4                                                                2 x 12 process       yes   yes                0.11237E+03
11       Nodes 1,2,3 with 1 more process upon node 3          2 x 12 + 1 proces   yes   yes               0.56223E+03
12       Nodes 1,2,3;MPI options --bycore --bind-to-core    2 x 12 + 1 proces   yes   yes               0.32452E+03
13       Nodes 1,4,3 with 1 more process upon node 3          2 x 12 + 1 proces   yes   yes               0.37252E+03
14       Nodes 1,4,3;MPI options --bysocket --bind-to-sock 2 x 12 + 1 proces   yes   yes               0.56666E+03
15       Nodes 1,4,3;MPI options --bycore --bind-to-core    2 x 12 + 1 proces   yes   yes               0.39983E+03
16       Nodes 2,3,4                                                              3 x 12 process       yes   yes                0.85723E+03
17       Nodes 2,3,4                                                              3 x 8 process         yes   yes                0.49378E+03
18       Nodes 1,2,3                                                              3 x 8 process         yes   yes                0.51863E+03
19       Nodes 1,2,3,4                                                           4 x 6 process         yes   yes                0.73272E+03
20
21       1,2,3,4; MPI options --bysocket --bind-to-socke      4 x 6 process          yes   yes                0.67739E+03
22       1,2,3,4; MPI options --bycore --bind-to-core            4 x 6 process          yes    yes                0.69612E+03
 The more surprising, even by taking in account latency between the nodes, are the tests
11 to 15. By adding only 1 process on the node 3, elapse time becomes 0.56e+03, i.e.
5 times the case 9 and 10. When partitioning upon 25 processors : 1 node represents 4%
of the simulation (I have verified each partitions : they contain approximatively the sames number
of elements plus or minus 8%), even one takes in account a latency factor of 10, i.e 40% more,
one should obtain (for test 10) : 0.11e+03 x 1.40 ~= 0.154e+03 sec.
Â
In addition, when I observe the data transfers upon the eth0 connexion during an iteration, I see that
when node 1 and 2 transfer, for example 5 Mo, then node 3 transfers 2,5 Mo. But if we consider that
node 3 is concerned by 4% of the data simulation, it should only need 200 Ko !
Â
Results are very differents too, between options with binding socket or binding core, as tests 13, 14 and 15 show.
Â
Regards.
Albert
ãÂ
|