Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Loading Open MPI from MPJ Express (Java) fails
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-24 13:16:15


On Mar 14, 2014, at 9:29 AM, Bibrak Qamar <bibrakc_at_[hidden]> wrote:

> It works for Open MPI but for MPICH3 I have to comment the dlopen. Is there any way to tell the compiler if its using Open MPI (mpicc) then use dlopen else keep it commented? Or some thing else?

If Open MPI's mpi.h, we have defined OPEN_MPI. You can therefore use #if defined (OPEN_MPI).

> Yes, there are some places where we need to be sync with the internals of the native MPI implementation. These are in section 8.1.2 of MPI 2.1 (http://www.mpi-forum.org/docs/mpi-2.1/mpi21-report.pdf). For example the MPI_TAG_UB. For the pure Java devices of MPJ Express we have always used Integer.MAX_VALUE.
>
> Datatypes?
>
> MPJ Express uses an internal buffering layer to buffer the user data into a ByteBuffer. In this way for the native device we end up using the MPI_BYTE datatype most of the time. ByteBuffer simplifies matters since it is directly accessible from the native code.

Does that mean you can't do heterogeneous? (not really a huge deal, since most people don't run heterogeneously, but something to consider)

> With our current implementation there is one exception to it i.e. in the Reduce, Allreduce and Reduce_scatter where the native MPI implementation needs to know which Java datatype its going to process. Same goes for MPI.Op

And Accumulate and the other Op-using functions, right?

> On Are your bindings similar in style/signature to ours?

No, they use the real datatypes.

> I checked it and there are differences. MPJ Express (and FastMPJ also) implements the mpiJava 1.2 specifications. There is also MPJ API (this is very close to mpiJava 1.2 API).
>
> Example 1: Getting the rank and size of COMM_WORLD
>
> MPJ Express (the mpiJava 1.2 API):
> public int Size() throws MPIException;
> public int Rank() throws MPIException;
>
> MPJ API:
> public int size() throws MPJException;
> public int rank() throws MPJException;
>
> Open MPI's Java bindings:
> public final int getRank() throws MPIException;
> public final int getSize() throws MPIException;

Right -- we *started* with the old ideas, but then made the conscious choice to update the Java bindings in a few ways:

- make them more like modern Java conventions (e.g., camel case, use verbs, etc.)
- get rid of MPI.OBJECT
- use modern, efficient Java practices
- basically, we didn't want to be bound by any Java decisions that were made long ago that aren't necessarily relevant any more
- and to be clear: we couldn't find many existing Java MPI codes, so compatibility with existing Java MPI codes was not a big concern

One thing we didn't do was use bounce buffers for small messages, which shows up in your benchmarks. We're considering adding that optimization, and others.

> Example 2: Point-to-Point communication
>
> MPJ Express (the mpiJava 1.2 API):
> public void Send(Object buf, int offset, int count, Datatype datatype, int dest, int tag) throws MPIException
>
> public Status Recv(Object buf, int offset, int count, Datatype datatype,
> int source, int tag) throws MPIException
>
> MPJ API:
> public void send(Object buf, int offset, int count, Datatype datatype, int dest, int tag) throws MPJException;
>
> public Status recv(Object buf, int offset, int count, Datatype datatype, int source, int tag) throws MPJException
>
> Open MPI's Java bindings:
> public final void send(Object buf, int count, Datatype type, int dest, int tag) throws MPIException
>
> public final Status recv(Object buf, int count, Datatype type, int source, int tag) throws MPIException
>
> Example 3: Collective communication
>
> MPJ Express (the mpiJava 1.2 API):
> public void Bcast(Object buf, int offset, int count, Datatype type, int root)
> throws MPIException;
>
> MPJ API:
> public void bcast(Object buffer, int offset, int count, Datatype datatype, int root) throws MPJException;
>
> Open MPI's Java bindings:
> public final void bcast(Object buf, int count, Datatype type, int root) throws MPIException;
>
>
> I couldn't find which API the Open MPI's Java bindings implement?

Our own. :-)

> But while reading your README.JAVA.txt and your code I realised that you are trying to avoid buffering overhead by giving the user the flexibility to directly allocate data onto a ByteBuffer using MPI.new<Type>Buffer, hence not following the mpiJava 1.2 specs (for communication operations)?

Correct.

> On Performance Comparison
>
> Yes this is interesting, I have managed to do two kind of tests: Ping-Pong (Latency and Bandwidth) and Collective Communications (Bcast).
>
> Attached are graphs and the programs (testcases) that I used. The tests were done using Infiniband, more on the platform here http://www.nust.edu.pk/INSTITUTIONS/Centers/RCMS/AboutUs/facilities/screc/Pages/Resources.aspx
>
> One reason for Open MPI's java bindings low performance (in the Bandwidth.png graph) is the way the test case was written (Bandwidth_OpenMPi.java). It allocates a total of 16M of byte array and uses the same array in send/recv for each data point (by varying count).
>
> This could be mainly because of the following code in mpi_Comm.c (let me know if I am mistaken)
>
> static void* getArrayPtr(void** bufBase, JNIEnv *env,
> jobject buf, int baseType, int offset)
> {
> switch(baseType)
> {
> ...
> ...
> case 1: {
> jbyte* els = (*env)->GetByteArrayElements(env, buf, NULL);
> *bufBase = els;
> return els + offset;
> }
> ...
> ...
> }
>
> Get<PrimitiveType>ArrayElements routine every time gets the entire array even if the user wants to send some elements (the count). This might be one reason for Open MPI' Java bindings to advocate for the MPI.new<Type>Buffer. The other reason is naturally the buffering overhead.

Yes.

There's *always* going to be a penalty to pay if you don't use native buffers, just due to the nature of Java garbage collection, etc.

> From the above experience, for the bandwidth of Bcast operation, I modified the test case to only allocate as much array as need for that Bcast and took the results. For a fairer comparison between MPJ Express and Open MPI's Java bindings I didn't use the MPI.new<Type>Buffer.

It would be interesting to see how using the native buffers compares, too -- i.e., are we correct in advocating for the use of native buffers?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/