Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC
From: Sasso, John (GE Power & Water, Non-GE) (John1.Sasso_at_[hidden])
Date: 2014-04-23 14:14:33


Thank-you Jeff. I re-ran IMB (a 64-core run, distributed across a number of nodes) under different mca parameters. Here are the results using OpenMPI 1.6.5:

1. --mca btl openib,sm,self --mca btl_openib_receive_queues X,9216,256,128,32:X,65536,256,128,32
        IMB did not hang. Consumed 9263 sec (aggregate) CPU time and 8986 MB memory

2. --mca btl openib,sm,self --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
        IMB hung on Bcast benchmark on a 64-process run, with message size of 64 bytes

3. --mca btl openib,sm,self
        IMB did not hang. Consumed 9360 sec (aggregate) CPU time and 9360 MB memory

4. --mca btl openib,tcp,self
        IMB did not hang. Consumed 41911 sec (aggregate) CPU time and 9239 MB memory

I did not try OpenMPI 1.8.1 since I am restricted to 1.6.5 at this time, but I'm doing a build of 1.8.1 now to test out. BTW, the release notes refer to 1.8.2 but the site only has 1.8.1 available for download.

I am a bit concerned, however, with my prior runs hanging. First, I was unable to discern why IMB was hanging so any advice/guidance would be greatly appreciated. I tried doing an strace on an MPI process but no helpful info.

Second, the motivation behind using XRC was to cut down on memory demands w.r.t. the RC QPs. I'd like to get this working, unless someone can elaborate on the negative aspects of using XRC instead of RC QPs. Thanks!

--john

-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff Squyres (jsquyres)
Sent: Wednesday, April 23, 2014 11:19 AM
To: Open MPI Users
Subject: Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC

A few suggestions:

- Try using Open MPI 1.8.1. It's the newest release, and has many improvements since the 1.6.x series.

- Try using "--mca btl openib,sm,self" (in both v1.6.x and v1.8.x). This allows Open MPI to use shared memory to communicate between processes on the same server, which can be a significant performance improvement over TCP or even IB.

On Apr 23, 2014, at 11:10 AM, "Sasso, John (GE Power & Water, Non-GE)" <John1.Sasso_at_[hidden]> wrote:

> I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was built with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI 1.6.5). I decided to use the following for the mca parameters:
>
> --mca btl openib,tcp,self --mca btl_openib_receive_queues X,9216,256,128,32:X,65536,256,128,32
>
> where before, I always used "--mca btl openib,tcp,self". This is for performance analysis. On the SendRecv benchmark at 32 processes, IMB hangs. I then tried:
>
> --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
>
> and IMB also hangs on the SendRecv benchmark, though at 64 processes.
>
> No errors have been recorded, not even in any system log files but 'top' shows the MPI tasks running. How can I go about troubleshooting this hang, as well as figuring out what (If any) MCA XRC-related parameters in btl_openib_receive_queues I have to specify to get IMB running properly? I did verify the IB cards are ConnectX.
>
> --john
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users