Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPIRUN Error on Mac pro i7 laptop and linux desktop
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2010-08-03 01:38:43


christophe petit wrote:
Thanks for your answers,

the execution of this parallel program works fine at my work, but we used MPICH2. I thought this will run with OPEN-MPI too.

In your input deck, how big are x_domains and y_domains -- that is, iconf(3) and iconf(4)?  Do they have to be changed if you change the number of processes you run on?  Off hand, it looks like x_domains*y_domains = iconf(3)*iconf(4) should equal nproc.  If you can run with nproc=1 and don't change the input deck, you won't be able to run on nproc/=1.

Given that the problem is in MPI_Cart_shift, could you produce a much smaller program that illustrates the error you're trying to understand?

Here is the f90 source where MPI_CART_SHIFT is called :

      program heat
!**************************************************************************
!
!   This program solves the heat equation on the unit square [0,1]x[0,1]
!        | du/dt - Delta(u) = 0
!        |  u/gamma = cste
!   by implementing a explicit scheme.
!   The discretization is done using a 5 point finite difference scheme
!   and the domain is decomposed into sub-domains.
!   The PDE is discretized using a 5 point finite difference scheme
!   over a (x_dim+2)*(x_dim+2) grid including the end points
!   correspond to the boundary points that are stored.
!
!   The data on the whole domain are stored in
!   the following way :
!
!    y
!           ------------------------------------
!    d      |                                  |
!    i      |                                  |
!    r      |                                  |
!    e      |                                  |
!    c      |                                  |
!    t      |                                  |
!    i      | x20                              |
!    o /\   |                                  |
!    n  |   | x10                              |
!       |   |                                  |
!       |   | x00  x01 x02 ...                 |
!       |   ------------------------------------
!        -------> x direction  x(*,j)
!
!   The boundary conditions are stored in the following submatrices
!
!
!        x(1:x_dim, 0)          ---> left   temperature
!        x(1:x_dim, x_dim+1)    ---> right  temperature
!        x(0, 1:x_dim)          ---> top    temperature
!        x(x_dim+1, 1:x_dim)    ---> bottom temperature
!
!**************************************************************************
      implicit none
      include 'mpif.h'
! size of the discretization
      integer :: x_dim, nb_iter
      double precision, allocatable :: x(:,:),b(:,:),x0(:,:)
      double precision  :: dt, h, epsilon
      double precision  :: resLoc, result, t, tstart, tend
!
      integer :: i,j
      integer :: step, maxStep
      integer :: size_x, size_y, me, x_domains,y_domains
      integer :: iconf(5), size_x_glo
      double precision conf(2)

! MPI variables
      integer :: nproc, infompi, comm, comm2d, lda, ndims
      INTEGER, DIMENSION(2)  :: dims
      LOGICAL, DIMENSION(2)  :: periods
      LOGICAL, PARAMETER     :: reorganisation = .false.
      integer :: row_type
      integer, parameter :: nbvi=4
      integer, parameter :: S=1, E=2, N=3, W=4
      integer, dimension(4) :: neighBor

!
      intrinsic abs
!
!
      call MPI_INIT(infompi)
      comm = MPI_COMM_WORLD
      call MPI_COMM_SIZE(comm,nproc,infompi)
      call MPI_COMM_RANK(comm,me,infompi)
!
!
      if (me.eq.0) then
          call readparam(iconf, conf)
      endif
      call MPI_BCAST(iconf,5,MPI_INTEGER,0,comm,infompi)
      call MPI_BCAST(conf,2,MPI_DOUBLE_PRECISION,0,comm,infompi)
!
      size_x    = iconf(1)
      size_y    = iconf(1)
      x_domains = iconf(3)
      y_domains = iconf(4)
      maxStep   = iconf(5)
      dt        = conf(1)
      epsilon   = conf(2)
!
      size_x_glo = x_domains*size_x+2
      h      = 1.0d0/dble(size_x_glo)
      dt     = 0.25*h*h
!
!
      lda = size_y+2
      allocate(x(0:size_y+1,0:size_x+1))
      allocate(x0(0:size_y+1,0:size_x+1))
      allocate(b(0:size_y+1,0:size_x+1))
!
! Create 2D cartesian grid
      periods(:) = .false.

      ndims = 2
      dims(1)=x_domains
      dims(2)=y_domains
      CALL MPI_CART_CREATE(MPI_COMM_WORLD, ndims, dims, periods, &
        reorganisation,comm2d,infompi)
!
! Identify neighbors
!
      NeighBor(:) = MPI_PROC_NULL
! Left/West and right/Est neigbors
      CALL MPI_CART_SHIFT(comm2d,0,1,NeighBor(W),NeighBor(E),infompi)
! Bottom/South and Upper/North neigbors
      CALL MPI_CART_SHIFT(comm2d,1,1,NeighBor(S),NeighBor(N),infompi)
!
! Create row data type to coimmunicate with South and North neighbors
!
      CALL MPI_TYPE_VECTOR(size_x, 1, size_y+2, MPI_DOUBLE_PRECISION, row_type,infompi)
      CALL MPI_TYPE_COMMIT(row_type, infompi)
!
! initialization
!
      call initvalues(x0, b, size_x+1, size_x )
!
! Update the boundaries
!
      call updateBound(x0,size_x,size_x, NeighBor, comm2d, row_type)
   
      step = 0
      t    = 0.0
!
      tstart = MPI_Wtime()
! REPEAT
 10   continue
!
         step = step + 1
         t    = t + dt
! perform one step of the explicit scheme
         call Explicit(x0,x,b, size_x+1, size_x, size_x, dt, h, resLoc)
! update the partial solution along the interface
         call updateBound(x0,size_x,size_x, NeighBor, comm2d, row_type)
! Check the distance between two iterates
         call MPI_ALLREDUCE(resLoc,result,1, MPI_DOUBLE_PRECISION, MPI_SUM,comm,infompi)
         result= sqrt(result)
!
         if (me.eq.0) write(*,1002) t,result
!
       if ((result.gt.epsilon).and.(step.lt.maxStep)) goto 10
!
! UNTIL "Convergence"
!
       tend = MPI_Wtime()
       if (me.eq.0) then
         write(*,*)
         write(*,*) ' Convergence after ', step,' steps '
         write(*,*) '      Problem size              ', size_x*x_domains*size_y*y_domains
         write(*,*) ' Wall Clock                     ', tend-tstart

!
! Print the solution at each point of the grid
!
         write(*,*)
         write(*,*) ' Computed solution '
         write(*,*)
         do 30, j=size_x+1,0,-1
            write(*,1000)(x0(j,i),i=0,size_x+1)
 30      continue
       endif
!
      call MPI_FINALIZE(infompi)
!
      deallocate(x)
      deallocate(x0)
      deallocate(b)
!
! Formats available to display the computed values on the grid
!
1000  format(100(1x, f7.3))
1001  format(100(1x, e7.3))
1002   format(' At time ',E8.2,' Norm ', E8.2)

!
      stop
      end
!

------------------------------------------------------------------------------
2010/7/28 Jeff Squyres <jsquyres@cisco.com>
According to the error message (especially since it's consistent across 2 different platforms), it looks like you have an error in your application.  Open MPI says that you're using an invalid communicator when calling MPI_Cart_shift.  "Invalid" probably means that it's not a Cartesian communicator.

You might want to double check the definition and requirements of the MPI_CART_SHIFT function (see the MPI_Cart_shift(3) man page).



On Jul 28, 2010, at 12:28 PM, christophe petit wrote:

> hello,
>
> i have a problem concerning the execution of a f90 program (explicitPar) compiled with openmpi-1.4.2. I get nearly the same error on my debian desktop ( AMD Phenom(tm) 9550 Quad-Core Processor) and my mac pro i7 laptop :
>
> on mac pro i7 :
>
> $ mpiexec -np 2 explicitPar
> [macbook-pro-de-fab.livebox.home:48805] *** An error occurred in MPI_Cart_shift
> [macbook-pro-de-fab.livebox.home:48805] *** on communicator MPI_COMM_WORLD
> [macbook-pro-de-fab.livebox.home:48805] *** MPI_ERR_COMM: invalid communicator
> [macbook-pro-de-fab.livebox.home:48805] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> --------------------------------------------------------------------------
> mpiexec has exited due to process rank 1 with PID 48805 on
> node macbook-pro-de-fab.livebox.home exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
>
> ---------------------------------------------------------------------------
>
> on my debian desktop :
>
> mpirun -np 2 explicitPar
> [pablo:11665] *** An error occurred in MPI_Cart_shift
> [pablo:11665] *** on communicator MPI_COMM_WORLD
> [pablo:11665] *** MPI_ERR_COMM: invalid communicator
> [pablo:11665] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 11665 on
> node pablo exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
>
> I have installed openmpi-1.4.2 with the following options :
>
> ./configure --prefix=/usr/local/openmpi --enable-mpi-f77 --enable-mpi-f90
>
> with exported variables on bash shell : FC=gfortran F90=gfortran F77=gfortran CC=gcc CXX=g++
>
> The  installation has been completed, the program compiles fine but i don't understand what's wrong. I note that with a single processor ("mpirun -np 1 explicitPar"), execution works fine.
>
> My debian desktop is a quad-core, so, theoretically, i can put up to 4 for "np" parameter.
> On my mac pro i7, i don't know how processors are there, but the "htop" command makes appear 4 cores too.
>
> Anyone has a solution ?
>
> Regards.
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users