On Sun, 2009-11-08 at 20:40 -0800, Martin Siegert wrote:
> Hi,
>
> I am running into a problem with mpi_allreduce when large buffers
> are used. But does not appear to be unique for mpi_allreduce; it
> occurs with mpi_send/mpi_recv as well; program is attached.
> 1) run this using MPI_Allreduce:
> allreduce completed 2.700941
> enter array size (integer; negative to stop):
> 320000000
>
> At this point the program just hangs forever.
You could use padb (It's linked to in my sig) to tell you where the
application is stuck - it could just be swapping.
> All programs/libraries are 64bit, interconnect is IB.
> I expect problems with sizes larger than 2^31-1, but these array sizes
> are still much smaller.
Whilst the message counts are smaller than 2^31-1 you should be aware
that the message sizes are larger as they are multiplied by
sizeof(double) so I wouldn't rule out this theory.
Also, you are mallocing at least 4Gb per process and quite possibly a
large amount for buffering in the MPI library as well, it could be that
you are simply running out of memory.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
|