I have used gprof to profile a program that uses openmpi. The result
shows that the code spends a long time in poll (37% on 8 cores, 50% on
16 and 85% on 32). I was wondering if there is anything I can do to
reduce the time spent in poll. I cannot determine the number of calls
made to poll and exactly where they are. The bulk of my code uses
exclusively MPI_Ssend for the send and MPI_Irecv and MPI_Wait for the
receive. For instance, would there be any gain expected if I switch
from MPI_Ssend to MPI_Send? Alternatively would there be any gain in
switching to MPI_Isend/MPI_Recv instead of MPI_Ssend/MPI_Irecv?
Redhat EL5 x86_64
I am using the sm and tcp btls on nodes with 8 cores (2 quad cores)
each (so 4 nodes for 32 cores).
Intel Xeon 2.7GHz