Dear openMpi users,
I am trying to develop a code that runs in parallel mode with openMPI (1.3.2
version). The code is written in Fortran 90, and I am running on a cluster
If I use 2 CPU the program runs fine, but for a larger number of CPUs I get the
following error:
[compute-2-6.local:18491] *** An error occurred in MPI_Recv
[compute-2-6.local:18491] *** on communicator MPI_COMM_WORLD
[compute-2-6.local:18491] *** MPI_ERR_TRUNCATE: message truncated
[compute-2-6.local:18491] *** MPI_ERRORS_ARE_FATAL (your MPI job will now
abort)
Here is the part of the code that this error refers to:
if( mumps_par%MYID .eq. 0 ) THEN
res=res+res_cpu
do iw=1,total_elem_cpu*unique
jacob(iw)=jacob(iw)+jacob_cpu(iw)
position_col(iw)=position_col(iw)+col_cpu(iw)
position_row(iw)=position_row(iw)+row_cpu(iw)
end do
do jw=1,nsize-1
call
MPI_recv(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status1,ierr)
call
MPI_recv(res_cpu,total_unknowns,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status2,ierr)
call
MPI_recv(row_cpu,total_elem_cpu*unique,MPI_INTEGER,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status3,ierr)
call
MPI_recv(col_cpu,total_elem_cpu*unique,MPI_INTEGER,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status4,ierr)
res=res+res_cpu
do iw=1,total_elem_cpu*unique
jacob(status1(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
jacob(status1(MPI_SOURCE)*total_elem_cpu*unique+iw)+jacob_cpu(iw)
position_col(status4(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
position_col(status4(MPI_SOURCE)*total_elem_cpu*unique+iw)+col_cpu(iw)
position_row(status3(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
position_row(status3(MPI_SOURCE)*total_elem_cpu*unique+iw)+row_cpu(iw)
end do
end do
else
call
MPI_Isend(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,0,mumps_par%MYID,MPI_COMM_WORLD,request1,ierr)
call
MPI_Isend(res_cpu,total_unknowns,MPI_DOUBLE_PRECISION,0,mumps_par%MYID,MPI_COMM_WORLD,request2,ierr)
call
MPI_Isend(row_cpu,total_elem_cpu*unique,MPI_INTEGER,0,mumps_par%MYID,MPI_COMM_WORLD,request3,ierr)
call
MPI_Isend(col_cpu,total_elem_cpu*unique,MPI_INTEGER,0,mumps_par%MYID,MPI_COMM_WORLD,request4,ierr)
call MPI_Wait(request1, status1, ierr)
call MPI_Wait(request2, status2, ierr)
call MPI_Wait(request3, status3, ierr)
call MPI_Wait(request4, status4, ierr)
end if
I am also using the MUMPS library
Could someone help to track this error down. Is really annoying to use only
two processors.
The cluster has about 8 nodes and each has 4 dual core CPU. I tried to run the
code on a single node with more than 2 CPU but I got the same error!!
If you need more info to identify this error, I will be gladly to provide.
Thank you for your time.
Vasilis
|