Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] "An error occurred in MPI_Recv" with more than 2 CPU
From: vasilis (gkanis_at_[hidden])
Date: 2009-05-26 07:11:52


Dear openMpi users,

I am trying to develop a code that runs in parallel mode with openMPI (1.3.2
version). The code is written in Fortran 90, and I am running on a cluster

If I use 2 CPU the program runs fine, but for a larger number of CPUs I get the
following error:

[compute-2-6.local:18491] *** An error occurred in MPI_Recv
[compute-2-6.local:18491] *** on communicator MPI_COMM_WORLD
[compute-2-6.local:18491] *** MPI_ERR_TRUNCATE: message truncated
[compute-2-6.local:18491] *** MPI_ERRORS_ARE_FATAL (your MPI job will now
abort)

Here is the part of the code that this error refers to:
if( mumps_par%MYID .eq. 0 ) THEN
                res=res+res_cpu
                do iw=1,total_elem_cpu*unique
                        jacob(iw)=jacob(iw)+jacob_cpu(iw)
                        position_col(iw)=position_col(iw)+col_cpu(iw)
                        position_row(iw)=position_row(iw)+row_cpu(iw)
                end do

                do jw=1,nsize-1
                        call
MPI_recv(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status1,ierr)
                        call
MPI_recv(res_cpu,total_unknowns,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status2,ierr)
                        call
MPI_recv(row_cpu,total_elem_cpu*unique,MPI_INTEGER,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status3,ierr)
                        call
MPI_recv(col_cpu,total_elem_cpu*unique,MPI_INTEGER,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,status4,ierr)
                   
  res=res+res_cpu
                        do iw=1,total_elem_cpu*unique
                                jacob(status1(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
                                        jacob(status1(MPI_SOURCE)*total_elem_cpu*unique+iw)+jacob_cpu(iw)
                                position_col(status4(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
                                        position_col(status4(MPI_SOURCE)*total_elem_cpu*unique+iw)+col_cpu(iw)
                                position_row(status3(MPI_SOURCE)*total_elem_cpu*unique+iw)=&
                                        position_row(status3(MPI_SOURCE)*total_elem_cpu*unique+iw)+row_cpu(iw)
                        end do
                end do
        else
                call
MPI_Isend(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,0,mumps_par%MYID,MPI_COMM_WORLD,request1,ierr)
                call
MPI_Isend(res_cpu,total_unknowns,MPI_DOUBLE_PRECISION,0,mumps_par%MYID,MPI_COMM_WORLD,request2,ierr)
                call
MPI_Isend(row_cpu,total_elem_cpu*unique,MPI_INTEGER,0,mumps_par%MYID,MPI_COMM_WORLD,request3,ierr)
                call
MPI_Isend(col_cpu,total_elem_cpu*unique,MPI_INTEGER,0,mumps_par%MYID,MPI_COMM_WORLD,request4,ierr)
  call MPI_Wait(request1, status1, ierr)
                call MPI_Wait(request2, status2, ierr)
                call MPI_Wait(request3, status3, ierr)
                call MPI_Wait(request4, status4, ierr)
        end if

I am also using the MUMPS library

Could someone help to track this error down. Is really annoying to use only
two processors.
The cluster has about 8 nodes and each has 4 dual core CPU. I tried to run the
code on a single node with more than 2 CPU but I got the same error!!

If you need more info to identify this error, I will be gladly to provide.

Thank you for your time.
Vasilis