Hi,
I am running into a problem with mpi_allreduce when large buffers
are used. But does not appear to be unique for mpi_allreduce; it
occurs with mpi_send/mpi_recv as well; program is attached.
1) run this using MPI_Allreduce:
# mpiexec -machinefile mfile -n 2 ./allreduce
choose algorithm: enter 1 for MPI_Allreduce
enter 2 for MPI_Send/Recv and MPI_Bcast
1
enter array size (integer; negative to stop):
40000000
allreduce completed 0.661867
enter array size (integer; negative to stop):
80000000
allreduce completed 1.356263
enter array size (integer; negative to stop):
160000000
allreduce completed 2.700941
enter array size (integer; negative to stop):
320000000
At this point the program just hangs forever.
2) running the MPI_Send/MPI_Recv/MPI_Bcast version:
# mpiexec -machinefile mfile -n 2 ./allreduce
choose algorithm: enter 1 for MPI_Allreduce
enter 2 for MPI_Send/Recv and MPI_Bcast
2
enter array size (integer; negative to stop):
40000000
id=0 received data from id=1 in 0.263818
bcast completed in 0.652631
allreduce completed in 1.102356
enter array size (integer; negative to stop):
80000000
id=0 received data from id=1 in 0.671201
bcast completed in 1.298208
allreduce completed in 2.341906
enter array size (integer; negative to stop):
160000000
[[43618,1],0][btl_openib_component.c:2951:handle_wc] from b2 to: b1 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 102347120 opcode 1 vendor error 105 qp_idx 3
--------------------------------------------------------------------------
mpiexec has exited due to process rank 0 with PID 26254 on
node b2 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------
All programs/libraries are 64bit, interconnect is IB.
I expect problems with sizes larger than 2^31-1, but these array sizes
are still much smaller.
What is the problem here?
Cheers,
Martin
--
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services phone: 778 782-4691
Simon Fraser University fax: 778 782-4242
Burnaby, British Columbia email: siegert_at_[hidden]
Canada V5A 1S6
|