Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [Open MPI] #3351: JAVA scatter error
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-12-17 21:10:53


On Dec 15, 2012, at 10:46 AM, Siegmar Gross wrote:

> If I misunderstood the mpiJava specification and I must create a special
> MPI object from my Java object: How do I create it? Thank you very much
> for any help in advance.

You sent me a source code listing off-list, but I want to reply on-list for a few reasons:

1. We actively do want feedback on these Java bindings
2. There is some documentation about this, but these bindings *are* different than the C bindings (and Java just behaves differently from Java), so it's worth documenting here in a Google-able location

I attached the source code listing you sent.

It is closer to correct, but I still don't think it's quite right. There's two issues here:

1. I have no idea how Java stores 2D arrays of doubles. I.e., you're using "double matrix[][]". I don't know if all P*Q values are stored contiguously in memory (or, more specifically, if the Java language *guarantees* that that will always be so).

2. Your MPI vector is almost right, but there's a subtle issue about MPI vectors that you're missing.

----
Because of #1, I changed your program to use matrix[], and have it malloc a single P*Q array.  Then I always accessed the {i,j} element via matrix[i * Q + j].  In this way, Java seems to keep all the values contiguously in memory.
That leads to this conversion of your program:
-----
import mpi.*;
public class ColumnScatterMain {
    static final int P = 4;
    static final int Q = 6;
    static final int NUM_ELEM_PER_LINE = 6;
    
    public static void main (String args[]) throws MPIException, InterruptedException
    {
	int      ntasks, mytid, i, j, tmp;
	double   matrix[], column[];
	Datatype column_t;
	
	MPI.Init (args);
	matrix = new double[P * Q];
	column = new double[P];
	mytid  = MPI.COMM_WORLD.Rank ();
	ntasks = MPI.COMM_WORLD.Size ();
	if (mytid == 0) {
	    if (ntasks != Q) {
		System.err.println ("\n\nI need exactly " + Q +
				    " processes.\n\n" +
				    "Usage:\n" +
				    "  mpiexec -np " + Q + 
				    " java <program name>\n");
	    }
	}
	if (ntasks != Q) {
	    MPI.Finalize ();
	    System.exit (0);
	}
	column_t = Datatype.Vector (P, 1, Q, MPI.DOUBLE);
	column_t.Commit ();
	if (mytid == 0) {
	    tmp = 1;
	    System.out.println ("\nmatrix:\n");
	    for (i = 0; i < P; ++i) {
		for (j = 0; j < Q; ++j) {
		    matrix[i * Q + j] = tmp++;
		    System.out.printf ("%10.2f", matrix[i * Q + j]);
		}
		System.out.println ();
	    }
	    System.out.println ();
	}
	MPI.COMM_WORLD.Scatter (matrix, 0, 1, column_t,
				column, 0, P, MPI.DOUBLE, 0);
	Thread.sleep(1000 * mytid); // Sleep to get ordered output
	System.out.println ("\nColumn of process " + mytid + "\n");
	for (i = 0; i < P; ++i) {
	    if (((i + 1) % NUM_ELEM_PER_LINE) == 0) {
		System.out.printf ("%10.2f\n", column[i]);
	    } else {
		System.out.printf ("%10.2f", column[i]);
	    }
	}
	System.out.println ();
	column_t.finalize ();
	MPI.Finalize();
    }
}
-----
Notice that the output for process 0 after the scatter is correct -- it shows that it received values 1, 7, 13, 19 for its column.  But all other processes are wrong.
Why?
Because of #2.  Notice that process 1 got values 20, 0, 0, 0 (or, more specifically, 20, junk, junk, junk).
That's because the vector datatype you created ended right at element 19.  So it started the next vector (i.e., to send to process 1) at the next element -- element 20.  And then went on in the same memory pattern from there, but that was already beyond the end of the array.  
Go google a tutorial on MPI_Type_vector and you'll see what I mean.
In C or Fortran, the solution would be to use an MPI_TYPE_UB at the end of the vector to artificially make the "next" vector be at element 1 (vs. element 20).  By the description in 3.12, it looks like they explicitly disallowed this (or, I guess, they didn't implement LB/UB properly -- but MPI_LB and MPI_UB are deprecated in MPI-3.0, anyway).  But I think it could be done with MPI_TYPE_CREATE_RESIZED, which, unfortunately, doesn't look like it is implemented in these java bindings yet.
Make sense?
-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/