I have attached a small program that when run on my machine produces the
error message below and locks up.
[node0000:06319] [mpool_gm_module.c:100] error(8) registering gm memory
I get the error when I run with 32 processors, but not with 4 (even if I
increase the loop count to 20000). This is on a cluster of dual-dual
core opterons with myrinet switches (i.e. using the gm routines).
Unfortunately, I don't have the configure options that were used to
build openmpi, but I don't think there was anything unusual. I've also
attached the open_info output. Here is the compile line for the code
g95 -o allreducetest allreducetest.F -I/usr/local/ompi/1.1-gcc/include
-L/usr/local/ompi/1.1-gcc/lib -lmpi
Also note that I did have to make changes to the fortran include files
in openmpi to force all of the integers to be of size 4 (i.e. declaring
them integer(4)) since the default integer size used by g95 is 8 bytes
but the openmpi fortran interface was compiled with f77 which uses 4
byte integers.
Any suggestions on what to look for?
Thanks for the help,
Dave
program parallel_sum_mmnts
real(kind=8):: zmmnts(0:360,28,0:8)
c Use reduction routines to sum whole beam moments across all
c of the processors. It also shares z moment data at PE boundaries.
c --- temporary for z moments
real(kind=8),allocatable:: ztemp(:,:,:)
integer(4):: nn,nslaves,my_index,ii
include "mpif.h"
integer(4):: mpierror
call MPI_INIT(mpierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nslaves,mpierror)
call MPI_COMM_RANK(MPI_COMM_WORLD,my_index,mpierror)
do ii=1,20000
print*,"PSM1 ",ii,my_index
zmmnts0 = my_index
zmmnts = my_index
allocate(ztemp(0:360,28,0:8))
c --- Do reduction on beam z moments.
ztemp = zmmnts
nn = (1+360)*28*(1+8)
print*,"PSM1 ",my_index,nn
call MPI_ALLREDUCE(ztemp,zmmnts,nn,
& MPI_DOUBLE_PRECISION,MPI_SUM,MPI_COMM_WORLD,mpierror)
print*,"PSM2 ",my_index
deallocate(ztemp)
enddo
stop
end
|