Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [mpich-discuss] problem with MPI_Get_count() for very long (but legal length) messages.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-02-08 21:33:43


FWIW, I filed https://svn.open-mpi.org/trac/ompi/ticket/2241 about this.

Thanks Jed!

On Feb 6, 2010, at 10:56 AM, Jed Brown wrote:

> On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith <bsmith_at_[hidden]> wrote:
> > To cheer you up, when I run with openMPI it runs forever sucking down
> > 100% CPU trying to send the messages :-)
>
> On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete
> after several seconds, but still prints the wrong count.
>
> MPICH2 does not actually send the message, as you can see by running the
> attached code.
>
> # Open MPI 1.4.1, correct cols[0]
> [0] sending...
> [1] receiving...
> count -103432106, cols[0] 0
>
> # MPICH2 1.2.1, incorrect cols[1]
> [1] receiving...
> [0] sending...
> [1] count -103432106, cols[0] 1
>
>
> How much memory does crush have (you need about 7GB to do this without
> swapping)? In particular, most of the time it took Open MPI to send the
> message (with your source) was actually just spent faulting the
> send/recv buffers. The attached faults the buffers first, and the
> subsequent send/recv takes less than 2 seconds.
>
> Actually, it's clear that MPICH2 never touches either buffer because it
> returns immediately regardless of whether they have been faulted first.
>
> Jed
>
>
> #include <mpi.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main(int argc,char **argv)
> {
> int ierr,i,size,rank;
> int cnt = 433438806;
> MPI_Status status;
> long long *cols;
>
> MPI_Init(&argc,&argv);
> ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);
> ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> if (size != 2) {
> fprintf(stderr,"[%d] usage: mpiexec -n 2 %s\n",rank,argv[0]);
> MPI_Abort(MPI_COMM_WORLD,1);
> }
>
> cols = malloc(cnt*sizeof(long long));
> for (i=0; i<cnt; i++) cols[i] = rank;
> if (rank == 0) {
> printf("[%d] sending...\n",rank);
> ierr = MPI_Send(cols,cnt,MPI_LONG_LONG_INT,1,0,MPI_COMM_WORLD);
> } else {
> printf("[%d] receiving...\n",rank);
> ierr = MPI_Recv(cols,cnt,MPI_LONG_LONG_INT,0,0,MPI_COMM_WORLD,&status);
> ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt);
> printf("[%d] count %d, cols[0] %lld\n",rank,cnt,cols[0]);
> }
> ierr = MPI_Finalize();
> return 0;
> }
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/