(reposted with consolidated information)
I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G cards
running Centos 5.7 Kernel 2.6.18-274
Open MPI 1.4.3
MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2):
On a Cisco 24 pt switch
 
Normal performance is:
$ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts  PingPong
results in:
 Max rate = 958.388867 MB/sec  Min latency = 4.529953 usec
and:
$ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts  PingPong
Max rate = 653.547293 MB/sec  Min latency = 19.550323 usec
 
NetPipeMPI  results show a max of 7.4 Gb/s at 8388605 bytes which seems fine.
log_num_mtt =20 and log_mtts_per_seg params =2
 
My application exchanges about a gig of data between the processes with 2 sender and 2 consumer processes on each node with 1 additional controller process on the starting node.
The program splits the data into 64K blocks and uses non blocking sends and receives with busy/sleep loops to monitor progress until completion.
Each process owns a single buffer for these 64K blocks.
 
 
My problem is I see better performance under IPoIB then I do on native IB (RDMA_CM).
My understanding is that IPoIB is limited to about 1G/s so I am at a loss to know why it is faster.
 
These 2 configurations are equivelant (about 8-10 seconds per cycle)
mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog
mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog
 
And this one produces similar run times but seems to degrade with repeated cycles:
mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl openib,self -H vh2,vh1 -np 9 --bycore  prog
 
Other  btl_openib_flags settings result in much lower performance. 
Changing the first of the above configs to use openIB results in a 21 second run time at best.  Sometimes it takes up to 5 minutes.
In all cases, OpenIB runs in twice the time it takes TCP,except if I push the small message max to 64K and force short messages.  Then the openib times are the same as TCP and no faster.
 
With openib:
- Repeated cycles during a single run seem to slow down with each cycle
(usually by about 10 seconds).
- On occasions it seems to stall indefinitely, waiting on a single receive. 
 
I'm  still at a loss as to why.  I can’t find any errors logged during the runs.
Any ideas appreciated.
 
Thanks in advance,
Randolph