I was comparing a simple Send-Recv program to another program with two
memcpy's to/from shared memory. Number of processes = 2 and different
array sizes (from 10^6 - 10^8 doubles) on IA64. With the --mca btl sm,self
options I get almost twice the bandwidth compared to the two memcpy's. I
looked at openmpi source and I cannot figure out if it's using anything
other than simple glibc memcpy. I must be missing something. Can somebody
P.S. I was not sure if I should post this message in the users or the devl
list, so I posted to both. Apologies for the multiple postings.