Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Loading Open MPI from MPJ Express (Java) fails
From: Bibrak Qamar (bibrakc_at_[hidden])
Date: 2014-03-14 09:29:49


And I managed to run Open MPI with MPJ Express. I added the following code
and it worked like a charm.

*In Java*
  /*
   * Static Block for loading the libnativempjdev.so
   */
  static {
    System.loadLibrary("nativempjdev");

    if(!loadGlobalLibraries()) {
        System.out.println("MPJ Express failed to load required libraries");
        System.exit(1);
    }
  }

*In C*

JNIEXPORT jboolean JNICALL Java_mpjdev_natmpjdev_Comm_loadGlobalLibraries
 (JNIEnv *env, jclass thisObject) {
    //This will make sure the library is loaded
    // in the case of Open MPI
    if (NULL == (mpilibhandle = dlopen("libmpi.so",
                                       RTLD_NOW | RTLD_GLOBAL))) {
        return JNI_FALSE;
    }
    return JNI_TRUE;
}

It works for Open MPI but for MPICH3 I have to comment the dlopen. Is there
any way to tell the compiler if its using Open MPI (mpicc) then use dlopen
else keep it commented? Or some thing else?

*On Java bindings to have some insight into the internals of the MPI
implementation*

Yes, there are some places where we need to be sync with the internals of
the native MPI implementation. These are in section 8.1.2 of MPI 2.1 (
http://www.mpi-forum.org/docs/mpi-2.1/mpi21-report.pdf). For example the
MPI_TAG_UB. For the pure Java devices of MPJ Express we have always used
Integer.MAX_VALUE.

*Datatypes?*

MPJ Express uses an internal buffering layer to buffer the user data into a
ByteBuffer. In this way for the native device we end up using the
MPI_BYTEdatatype most of the time. ByteBuffer
simplifies matters since it is directly accessible from the native code.

With our current implementation there is one exception to it i.e. in the
Reduce, Allreduce and Reduce_scatter where the native MPI implementation
needs to know which Java datatype its going to process. Same goes for MPI.Op

*On Are your bindings similar in style/signature to ours?*

I checked it and there are differences. MPJ Express (and FastMPJ also)
implements the mpiJava 1.2 specifications. There is also MPJ API (this is
very close to mpiJava 1.2 API).

*Example 1: Getting the rank and size of COMM_WORLD*

*MPJ Express (the mpiJava 1.2 API):*
 public int Size() throws MPIException;
 public int Rank() throws MPIException;

*MPJ API:*
 public int size() throws MPJException;
 public int rank() throws MPJException;

*Open MPI's Java bindings:*
 public final int getRank() throws MPIException;
 public final int getSize() throws MPIException;

*Example 2: Point-to-Point communication*

*MPJ Express (the mpiJava 1.2 API):*
 public void Send(Object buf, int offset, int count, Datatype datatype, int
dest, int tag) throws MPIException

 public Status Recv(Object buf, int offset, int count, Datatype datatype,
      int source, int tag) throws MPIException

*MPJ API:*
 public void send(Object buf, int offset, int count, Datatype datatype, int
dest, int tag) throws MPJException;

 public Status recv(Object buf, int offset, int count, Datatype datatype,
int source, int tag) throws MPJException

*Open MPI's Java bindings:*
 public final void send(Object buf, int count, Datatype type, int dest, int
tag) throws MPIException

 public final Status recv(Object buf, int count, Datatype type, int source,
int tag) throws MPIException

*Example 3: Collective communication*

*MPJ Express (the mpiJava 1.2 API):*
 public void Bcast(Object buf, int offset, int count, Datatype type, int
root)
      throws MPIException;

*MPJ API:*
 public void bcast(Object buffer, int offset, int count, Datatype datatype,
int root) throws MPJException;

*Open MPI's Java bindings:* public final void bcast(Object buf, int count,
Datatype type, int root) throws MPIException;

I couldn't find which API the Open MPI's Java bindings implement? But while
reading your README.JAVA.txt and your code I realised that you are trying
to avoid buffering overhead by giving the user the flexibility to directly
allocate data onto a ByteBuffer using MPI.new<Type>Buffer, hence not
following the mpiJava 1.2 specs (for communication operations)?

*On Performance Comparison*

Yes this is interesting, I have managed to do two kind of tests: Ping-Pong
(Latency and Bandwidth) and Collective Communications (Bcast).

Attached are graphs and the programs (testcases) that I used. The tests
were done using Infiniband, more on the platform here
http://www.nust.edu.pk/INSTITUTIONS/Centers/RCMS/AboutUs/facilities/screc/Pages/Resources.aspx

One reason for Open MPI's java bindings low performance (in the
Bandwidth.png graph) is the way the test case was written
(Bandwidth_OpenMPi.java). It allocates a total of 16M of byte array and
uses the same array in send/recv for each data point (by varying count).

This could be mainly because of the following code in mpi_Comm.c (let me
know if I am mistaken)

static void* getArrayPtr(void** bufBase, JNIEnv *env,
                         jobject buf, int baseType, int offset)
{
    switch(baseType)
    {
           ...
           ...
          case 1: {
            jbyte* els = (*env)->GetByteArrayElements(env, buf, NULL);
            *bufBase = els;
            return els + offset;
        }
           ...
           ...
}

Get<PrimitiveType>ArrayElements routine every time gets the entire array
even if the user wants to send some elements (the count). This might be one
reason for Open MPI' Java bindings to advocate for the MPI.new<Type>Buffer.
The other reason is naturally the buffering overhead.

>From the above experience, for the bandwidth of Bcast operation, I modified
the test case to only allocate as much array as need for that Bcast and
took the results. For a fairer comparison between MPJ Express and Open
MPI's Java bindings I didn't use the MPI.new<Type>Buffer.

regards
Bibrak Qamar

On Wed, Mar 12, 2014 at 6:42 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Here's what I had to do to load the library correctly (we were only using
> ORTE, so substitute "libmpi") - this was called at the beginning of "init":
>
> /* first, load the required ORTE library */
> #if OPAL_WANT_LIBLTDL
> lt_dladvise advise;
>
> if (lt_dlinit() != 0) {
> fprintf(stderr, "LT_DLINIT FAILED - CANNOT LOAD LIBMRPLUS\n");
> return JNI_FALSE;
> }
>
> #if OPAL_HAVE_LTDL_ADVISE
> /* open the library into the global namespace */
> if (lt_dladvise_init(&advise)) {
> fprintf(stderr, "LT_DLADVISE INIT FAILED - CANNOT LOAD
> LIBMRPLUS\n");
> return JNI_FALSE;
> }
>
> if (lt_dladvise_ext(&advise)) {
> fprintf(stderr, "LT_DLADVISE EXT FAILED - CANNOT LOAD
> LIBMRPLUS\n");
> lt_dladvise_destroy(&advise);
> return JNI_FALSE;
> }
>
> if (lt_dladvise_global(&advise)) {
> fprintf(stderr, "LT_DLADVISE GLOBAL FAILED - CANNOT LOAD
> LIBMRPLUS\n");
> lt_dladvise_destroy(&advise);
> return JNI_FALSE;
> }
>
> /* we don't care about the return value
> * on dlopen - it might return an error
> * because the lib is already loaded,
> * depending on the way we were built
> */
> lt_dlopenadvise("libopen-rte", advise);
> lt_dladvise_destroy(&advise);
> #else
> fprintf(stderr, "NO LT_DLADVISE - CANNOT LOAD LIBMRPLUS\n");
> /* need to balance the ltdl inits */
> lt_dlexit();
> /* if we don't have advise, then we are hosed */
> return JNI_FALSE;
> #endif
> #endif
> /* if dlopen was disabled, then all symbols
> * should have been pulled up into the libraries,
> * so we don't need to do anything as the symbols
> * are already available.
> */
>
> On Mar 12, 2014, at 6:32 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
> wrote:
>
> > Check out how we did this with the embedded java bindings in Open MPI;
> see the comment describing exactly this issue starting here:
> >
> >
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/java/c/mpi_MPI.c#L79
> >
> > Feel free to compare MPJ to the OMPI java bindings -- they're shipping
> in 1.7.4 and have a bunch of improvements in the soon-to-be-released 1.7.5,
> but you must enable them since they aren't enabled by default:
> >
> > ./configure --enable-mpi-java ...
> >
> > FWIW, we found a few places in the Java bindings where it was necessary
> for the bindings to have some insight into the internals of the MPI
> implementation. Did you find the same thing with MPJ Express?
> >
> > Are your bindings similar in style/signature to ours?
> >
> >
> >
> > On Mar 12, 2014, at 6:40 AM, Bibrak Qamar <bibrakc_at_[hidden]> wrote:
> >
> >> Hi all,
> >>
> >> I am writing a new device for MPJ Express that uses a native MPI
> library for communication. Its based on JNI wrappers like the original
> mpiJava. The device works fine with MPICH3 (and MVAPICH2.2). Here is the
> issue with loading Open MPI 1.7.4 from MPJ Express.
> >>
> >> I have generated the following error message from a simple JNI to MPI
> application for clarity purposes and also to regenerate the error easily. I
> have attached the app for your consideration.
> >>
> >>
> >> [bibrak_at_localhost JNI_to_MPI]$ mpirun -np 2 java -cp .
> -Djava.library.path=/home/bibrak/work/JNI_to_MPI/ simpleJNI_MPI
> >> [localhost.localdomain:29086] mca: base: component_find: unable to open
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap:
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so:
> undefined symbol: opal_show_help (ignored)
> >> [localhost.localdomain:29085] mca: base: component_find: unable to open
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap:
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so:
> undefined symbol: opal_show_help (ignored)
> >> [localhost.localdomain:29085] mca: base: component_find: unable to open
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix:
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so:
> undefined symbol: opal_shmem_base_framework (ignored)
> >> [localhost.localdomain:29086] mca: base: component_find: unable to open
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix:
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so:
> undefined symbol: opal_shmem_base_framework (ignored)
> >> [localhost.localdomain:29086] mca: base: component_find: unable to open
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv:
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so:
> undefined symbol: opal_show_help (ignored)
> >>
> --------------------------------------------------------------------------
> >> It looks like opal_init failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during opal_init; some of which are due to configuration or
> >> environment problems. This failure appears to be an internal failure;
> >> here's some additional information (which may only be relevant to an
> >> Open MPI developer):
> >>
> >> opal_shmem_base_select failed
> >> --> Returned value -1 instead of OPAL_SUCCESS
> >>
> --------------------------------------------------------------------------
> >> [localhost.localdomain:29085] mca: base: component_find: unable to open
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv:
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so:
> undefined symbol: opal_show_help (ignored)
> >>
> --------------------------------------------------------------------------
> >> It looks like orte_init failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during orte_init; some of which are due to configuration or
> >> environment problems. This failure appears to be an internal failure;
> >> here's some additional information (which may only be relevant to an
> >> Open MPI developer):
> >>
> >> opal_init failed
> >> --> Returned value Error (-1) instead of ORTE_SUCCESS
> >>
> --------------------------------------------------------------------------
> >>
> --------------------------------------------------------------------------
> >> It looks like MPI_INIT failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during MPI_INIT; some of which are due to configuration or
> environment
> >> problems. This failure appears to be an internal failure; here's some
> >> additional information (which may only be relevant to an Open MPI
> >> developer):
> >>
> >> ompi_mpi_init: ompi_rte_init failed
> >> --> Returned "Error" (-1) instead of "Success" (0)
> >>
> --------------------------------------------------------------------------
> >> *** An error occurred in MPI_Init
> >> *** on a NULL communicator
> >> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> >> *** and potentially your MPI job)
> >> [localhost.localdomain:29086] Local abort before MPI_INIT completed
> successfully; not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
> >>
> --------------------------------------------------------------------------
> >> It looks like opal_init failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during opal_init; some of which are due to configuration or
> >> environment problems. This failure appears to be an internal failure;
> >> here's some additional information (which may only be relevant to an
> >> Open MPI developer):
> >>
> >> opal_shmem_base_select failed
> >> --> Returned value -1 instead of OPAL_SUCCESS
> >>
> --------------------------------------------------------------------------
> >> -------------------------------------------------------
> >> Primary job terminated normally, but 1 process returned
> >> a non-zero exit code.. Per user-direction, the job has been aborted.
> >> -------------------------------------------------------
> >>
> --------------------------------------------------------------------------
> >> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> >> the job to be terminated. The first process to do so was:
> >>
> >> Process name: [[41236,1],1]
> >> Exit code: 1
> >>
> --------------------------------------------------------------------------
> >>
> >>
> >> This is a thread that I found where the Open MPI developers were having
> issues while integrating mpiJava into their stack.
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201202.mbox/%3C5EA543BD-A12E-4729-B66A-C746BC789EC3@open-mpi.org%3E
> >>
> >> I don't have better understanding of the internals of Open MPI, so my
> question is how to use Open MPI using JNI wrappers? Any guidelines in this
> regard?
> >>
> >> Thanks
> >> Bibrak
> >>
> >> <JNI_to_MPI.tar.gz>_______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/03/14335.php
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/03/14337.php
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/03/14338.php
>





Bandwidth.png
bcast_bandwidth.png
Latency.png