Paul I tried NetPipeMPI - (belatedly because their site was down down for a couple of days)

The results show a max of 7.4 Gb/s at 8388605 bytes which seems fine.

But my program still runs slowly and stalls occasionally.
I've using 1 buffer per process - I assume this is ok.
Is it of any significance that the  log_num_mtt and log_mtts_per_seg params where not set?
Is this a symptom of a broken install?

Reposting original message for clarity - its been a few days...
2nd posts are below this 1st section
I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G cards
running Centos 5.7 Kernel 2.6.18-274
Open MPI 1.4.3
MLNX_OFED_LINUX-1.5.3- (OFED-1.5.3-
On a Cisco 24 pt switch

Normal performance is:
$ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts  PingPong
results in:
 Max rate = 958.388867 MB/sec  Min latency = 4.529953 usec
$ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts  PingPong
Max rate = 653.547293 MB/sec  Min latency = 19.550323 usec

My application exchanges about a gig of data between the processes with 2 sender and 2 consumer processes on each node with 1 additional controler process on the starting node.
The program splits the data into 64K blocks and uses non blocking sends and receives with busy/sleep loops to monitor progress until completion.

My problem is I see better performance under IPoIB then I do on native IB (RDMA_CM).
My understanding is that IPoIB is limited to about 1G/s so I am at a loss to know why it is faster.

These 2 configurations are equivelant (about 8-10 seconds per cycle)
mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog
mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog

And this one produces similar run times but seems to degrade with repeated cycles:
mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl openib,self -H vh2,vh1 -np 9 --bycore  prog

Other  btl_openib_flags settings result in much lower performance. 
Changing the first of the above configs to use openIB results in a 21 second run time at best.  Sometimes it takes up to 5 minutes.
With openib:
- Repeated cycles during a single run seem to slow down with each cycle.
- On occasions it seems to stall indefinately, waiting on a single receive. 

Any ideas appreciated.

Thanks in advance,

From: Randolph Pullen <>
To: Paul Kapinos <>; Open MPI Users <>
Sent: Thursday, 30 August 2012 11:46 AM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling

Interesting, the log_num_mtt and log_mtts_per_seg params where not set.
Setting them to utilise 2*8G of my RAM resulted in no change to the stalls or run time ie; (19,3) (20,2) (21,1) or (6,16). 
In all cases, OpenIB runs in twice the time it takes TCP,except if I push the small message max to 64K and force short messages.  Then the openib times are the same as TCP and no faster.

I'ms till at a loss as to why...

From: Paul Kapinos <>
To: Randolph Pullen <>; Open MPI Users <>
Sent: Tuesday, 28 August 2012 6:13 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling

after reading this:

On 08/28/12 04:26, Randolph Pullen wrote:
> - On occasions it seems to stall indefinately, waiting on a single receive.

... I would make a blind guess: are you aware about IB card parameters for registered memory?

"Waiting forever" for a single operation is one of symptoms of the problem especially in 1.5.3.


P.S. the lower performance with 'big' chinks is known phenomenon, cf.
(image on bottom of the page). But the chunk size of 64k is fairly small

-- Dipl.-Inform. Paul Kapinos  -  High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

users mailing list