Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [Open MPI] #3351: JAVA scatter error
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-12-18 13:58:05


On Dec 18, 2012, at 12:05 PM, Siegmar Gross wrote:

> I know how to use MPI_Scatter or MPI_Scatterv in C, because I have
> written some small and working example programs myself in the past.
> My first Java program with MPI_Scatter was ColumnScatterMain.java
> which I had sent to the list early October and now once more to you in
> December. October 10th I had sent the program ColumnSendRecvMain.java
> to the list (Subject: Datatype.Vector in mpijava in openmpi-1.9a1r27380),
> because I thought and still think that building a column vector
> doesn't work as expected. At the end of that email I wrote "In my
> opinion Datatype.Vector doesn't work as expected. mpiJava doesn't
> support something similar to MPI_Type_create_resized so how can I use
> column_t in a scatter operation? Will scatter automatically start with
> the next element and not with the element following the extent of
> column_t?".

My mistake for not reading your mails carefully; sorry. :-(

> In my opinion Datatype.Vector must set the size of the
> base datatype as extent of the vector and not the true extent, because
> MPI-Java doesn't provide a function to resize a datatype.

No, I think Datatype.Vector is doing the Right Thing in that it acts just like MPI_Type_vector. We do want these to be *bindings*, after all -- meaning that they should be pretty much a 1:1 mapping to the C bindings.

I think the real shortcoming is that there is no Datatype.Resized function. That can be fixed.

> Furthermore
> Datatype.Struct allows only a collection of elements of the same type,
> so that you must use a data object, if you want to scatter or broadcast
> data of different types in one operation.

Agreed. I'm not 100% sure why the original project made this design decision (I don't fully grok the explanation given in the paper; I'm not enough of a Java guy to know...), but it would be good if that could be fixed.

> We should forget
> ObjectScatterMain.java for the moment and concentrate on
> ObjectBroadcastMain.java, which I have sent three days ago to the list,
> because it has the same problem.
>
> 1) ColumnSendRecvMain.java
>
> I create a 2D-matrix with (Java books would use "double[][] matrix"
> which is the same in my opinion, but I like C notation)
>
> double matrix[][] = new double[P][Q];

I noticed that if I used [][] in my version of the Scatter program, I got random results. But if I used [] and did my own offset indexing, it worked.

See my prior guess about Java contiguous memory storage.

> Next I create a column vector
>
> column_t = Datatype.Vector (P, 1, Q, MPI.DOUBLE);
> column_t.Commit ();
>
> which I can use in a send/recv-operation
>
> if (mytid == 0)
> {
> /* send one column to each process */
> for (i = 0; i < Q; ++i)
> {
> MPI.COMM_WORLD.Send (matrix, i, 1, column_t, i + 1, 0);
> }
> }
> else
> {
> MPI.COMM_WORLD.Recv (column, 0, P, MPI.DOUBLE, 0, 0);
>
> This example doesn't depend on the extent of column_t, because I set
> the "offset" where every column starts (at least I think so :-) ).

Other than the [][] vs. [] thing, I agree.

> Java doesn't want that a user has any knowledge about memory layouts
> or addresses of data structures. That's the reason why I think that
> all necessary computations and transformations must be done in
> Datatype.Vector, MPI.COMM_WORLD.Send, and MPI.COMM_WORLD.Recv.
> Unfortunately it seems that that is not the case.

I don't think that MPI can magically figure this out (or perhaps I don't know enough about Java to be correct).

If double[][] is a fundamentally different type (and storage format) than double[], what is MPI to do? How can it tell the difference?

> It is easy to see that process 1 doesn't get column 0. Your
> suggestion to allocate enough memory for a matrix (without defining
> a matrix) and doing all index computations yourself is in my opinion
> not applicable for a "normal" Java programmer (it's even hard for
> most C programmers :-) ). Hopefully you have an idea how to solve
> this problem so that all processes receive correct column values.

I'm afraid I don't, other than defining your own class which allocates memory contiguously, but overrides [] and [][] (I'm *assuming* you can do that in Java...?).

> 2) ObjectBroadcastMain.java
>
> As I said above, it is my understanding, that I can send a Java object
> when I use MPI.OBJECT and that the MPI implementation must perform all
> necessary tasks.

Remember: there is no standard for MPI and Java. So there is no "must". :-)

This is one research implementation that was created. We can update it and try to make it better, but we're somewhat crafting the rules as we go along here.

(BTW, if we continue detailed discussions about implementation, this conversation should probably move to the devel list...)

> Your interface for derived datatypes provides only
> methods for discontiguous data and no method to create an MPI.OBJECT,
> so that I have no idea what I would have to do to create one. The
> object must be serializable so that you get the same values in a
> heterogeneous environment.
>
> tyr java 146 mpiexec -np 2 java ObjectBroadcastMain
> Exception in thread "main" java.lang.ClassCastException:
> MyData cannot be cast to [Ljava.lang.Object;
> at mpi.Comm.Object_Serialize(Comm.java:207)
> at mpi.Comm.Send(Comm.java:292)
> at mpi.Intracomm.Bcast(Intracomm.java:202)
> at ObjectBroadcastMain.main(ObjectBroadcastMain.java:44)
> ...

After rooting around in the code a bit, I think I understand this stack trace a bit better now..

The code line in question is in the Object_Serialize method, where it calls:

        Object buf_els [] = (Object[])buf;

So it's trying to cast an (Object) to an (Object[]). Apparently, this works for intrinsic Java types (e.g., int). But it doesn't seem to work with your own class.

Again, here's my disclaimer that I'm not a Java guy... :-) But does this mean you need to define an operator[] method on your class, and that would allow this casting to work? (not that I'm sure what this method would need to *do*, but this is a first step...)

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/