Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] alltoall messages > 2^26
From: Yevgeny Kliteynik (kliteyn_at_[hidden])
Date: 2011-05-29 08:49:41


Michael,

Could you try to run this again with "--mca mpi_leave_pinned 0" parameter?
I suspect that this might be due to a message size problem - MPI
tries to do RDMA with a message bigger than what HCA supports.

-- YK

On 11-Apr-11 7:44 PM, Michael Di Domenico wrote:
> Here's a chunk of code that reproduces the error everytime on my cluster
>
> If you call it with $((2**24)) as a parameter it should run fine, change it to $((2**27)) and it will stall
>
> On Tue, Apr 5, 2011 at 11:24 AM, Terry Dontje <terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>> wrote:
>
> It was asked during the community concall whether the below may be related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722?
>
> --td
>
> On 04/04/2011 10:17 PM, David Zhang wrote:
>> Any error messages? Maybe the nodes ran out of memory? I know MPI implement some kind of buffering under the hood, so even though you're sending array's over 2^26 in size, it may require more than that for MPI to actually send it.
>>
>> On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico <mdidomenico4_at_[hidden] <mailto:mdidomenico4_at_[hidden]>> wrote:
>>
>> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
>> messages over 2^26 in size?
>>
>> For a reason i have not determined just yet machines on my cluster
>> (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
>> array's over 2^26 in size via the AllToAll collective. (user code)
>>
>> Further testing seems to indicate that an MPI message over 2^26 fails
>> (tested with IMB-MPI)
>>
>> Running the same test on a different older IB connected cluster seems
>> to work, which would seem to indicate a problem with the infiniband
>> drivers of some sort rather then openmpi (but i'm not sure).
>>
>> Any thoughts, directions, or tests?
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>> --
>> David Zhang
>> University of California, San Diego
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Oracle
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle *- Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users