Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2005-08-15 15:45:58


Joel,

I took a look at your code and found the error. Basically, it's just
a datatype problem. The datatype as described in your program does
not correspond to the one you expect to see in practice. Actually you
forget to set the correct extent.

Let me show you the problem. Let's suppose 2 processes and the
default values from you program. The original matrix (at the root) is:
root 0.000000 1.000000 2.000000
root 3.000000 4.000000 5.000000
root 6.000000 7.000000 8.000000
root 9.000000 10.000000 11.000000
root 12.000000 13.000000 14.000000
root 15.000000 16.000000 17.000000
root 18.000000 19.000000 20.000000
root 21.000000 22.000000 23.000000
root 24.000000 25.000000 26.000000
root 27.000000 28.000000 29.000000

And your datatype is vector( 5, 1, 3, MPI_DOUBLE). If you look at the
definition of the vector type as defined by the MPI standard you will
notice that the datatype will end at the end of the last element in
the vector, and will not add any gap at the end. Thus the extent of
your datatype is 13 double [(5 - 1) * 3 + 1]. Here is the memory
covered by one element:
root 0.000000 1.000000 2.000000
root 3.000000 4.000000 5.000000
root 6.000000 7.000000 8.000000
root 9.000000 10.000000 11.000000
root 12.000000

Then if you consider a memory layout containing 2 such datatypes (as
the scatter does) the first element from the second one will be 13
not 15 as you expect.

Now if you need to have the second datatype starting with 15 you have
to extent the last line to include all elements on the last line (13
and 14). You can use MPI_UB or MPI_Type_create_resized (depending if
you want MPI 1 or MPI 2). Attached you will find a C program who does
exactly what you expect. You can define MORE_OUTPUT to see how
exactly your matrices get filled at each step.

   george.


PS: I was unable to compile any of the codes you attached to your
email, so I write them starting from your code as well as your
description. Hope they answer to your question.

On Aug 4, 2005, at 3:04 PM, Joel Eaves wrote:

> Hi group. I posted a general MPI question a while ago to the mpi
> newsgroup but didn't get a response. I need to figure this out so
> I thought I would try it on you.
>
> I have written a piece of code that fills a 2D array sequentially so
> that I can keep track of which elements are being dropped in the
> message passing. I use the type_vector datatype to generate a
> datatype
> for passing the columns. In C, I can see that the scatter operation
> passes the first matrix to process 0 correctly but that the second
> matrix to process 1 is screwed up because the elements are set
> backwards by two. In other words, the second matrix begins with the
> lucky 13th element instead of the 15th like it should. There is
> overlap -- the same elements appear in both of the scattered matrices.
> The C++ code goes over like a lead baloon. The operation is clearly
> asking for data outside of the range for the filled matrix and so the
> values of the scattered matrix are all screwed up. I am using the LAM
> MPI v. 7.1.1 and Mac OS 10.3.8
> with gcc v. 3.3. I got similar results using MPICH-2 on Linux.
> Here's a piece of code written in C.
>
> #include <mpi.h>
> #include <iostream>
>
> int main(int argc,char* argv[]){
> MPI_Init(&argc,&argv);
> int my_rank = MPI::COMM_WORLD.Get_rank(),n_global = 10,n_procs =
> MPI::COMM_WORLD.Get_size(),
> d=3,n_local = n_global/n_procs,i,k,root=0;
> double A_global[n_global][d],A_local[n_local][d];
> MPI_Datatype scatter;
> MPI_Type_vector(n_local,1,d,MPI_DOUBLE,&scatter);
> MPI_Type_commit(&scatter);
> if(my_rank==root){
> for(i=0;i<n_global;i++)
> for(k=0;k<d;k++)
> A_global[i][k] = i*d+k;
> for(k=0;k<d;k++)
> MPI_Scatter(&(A_global[0][k]),1,scatter,&(A_local[0][k]),
> 1,scatter,root,MPI_COMM_WORLD);
> for(i=0;i<n_local;i++){
> for(k=0;k<d;k++)
> cout << A_local[i][k] << "\t";
> cout << endl;
>
> }
>
> MPI_Finalize();
> return 0;
> }
>
> In C++, the code is
> #include <mpi.h>
> #include <iostream>
> int main(int argc,char* argv[]){
> MPI::Init();
> int my_rank = MPI::COMM_WORLD.Get_rank(),n_global = 10,n_procs =
> MPI::COMM_WORLD.Get_size(),
> d=3,n_local = n_global/n_procs,i,k,root=0;
> double A_global[n_global][d],A_local[n_local][d];
> MPI::Datatype scatter(MPI::DOUBLE);
> scatter.Create_vector(n_local,1,d);
> scatter.Commit();
> if(my_rank==root){
> for(i=0;i<n_global;i++)
> for(k=0;k<d;k++)
> A_global[i][k] = i*d+k;
> for(k=0;k<d;k++)
> MPI::COMM_WORLD.Scatter(&(A_global[0][k]),1,scatter,&(A_local[0]
> [k]),1,scatter,root);
> for(i=0;i<n_local;i++){
> for(k=0;k<d;k++)
> cout << A_local[i][k] << "\t";
> cout << endl;
>
> }
>
> MPI::Finalize();
> return 0;
> }
>
> I'm running the process (after a lamboot) with the command
> mpirun -np 2 scatter.out
>
> and compiling with the command
>
> mpic++ Scatter.cpp -o scatter.out
>
> Can anyone help out with this? I don't
> understand why the commands for C++ are returning erroneous results
> that are *different* than they are from the C program.
>
> Thanks,
>
> Joel
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users