Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] alltoall messages > 2^26
From: Michael Di Domenico (mdidomenico4_at_[hidden])
Date: 2011-04-05 08:31:13


There are no messages being spit out, but i'm not sure i have all the
correct debugs turn on. I turned on -debug-devel -debug-daemons and
mca_verbose. but it appears that the process just hangs.

If it's memory exhaustion its not from the core memory these nodes
have 48GB of memory, it could be a buffer somewhere, but i'm not sure
where

On Mon, Apr 4, 2011 at 10:17 PM, David Zhang <solarbikedz_at_[hidden]> wrote:
> Any error messages?  Maybe the nodes ran out of memory?  I know MPI
> implement some kind of buffering under the hood, so even though you're
> sending array's over 2^26 in size, it may require more than that for MPI to
> actually send it.
>
> On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico <mdidomenico4_at_[hidden]>
> wrote:
>>
>> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
>> messages over 2^26 in size?
>>
>> For a reason i have not determined just yet machines on my cluster
>> (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
>> array's over 2^26 in size via the AllToAll collective. (user code)
>>
>> Further testing seems to indicate that an MPI message over 2^26 fails
>> (tested with IMB-MPI)
>>
>> Running the same test on a different older IB connected cluster seems
>> to work, which would seem to indicate a problem with the infiniband
>> drivers of some sort rather then openmpi (but i'm not sure).
>>
>> Any thoughts, directions, or tests?
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> David Zhang
> University of California, San Diego
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>