Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun error in OpenMPI 1.5
From: Nguyen Toan (nguyentoan1508_at_[hidden])
Date: 2010-12-08 15:05:21


Dear Ralph,

Thank you for your reply. I did check the ld_library_path and recompile with
the new version and it worked perfectly.
Thank you again.

Best Regards,
Toan

On Thu, Dec 9, 2010 at 12:30 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> That could mean you didn't recompile the code using the new version of
> OMPI. The 1.4 and 1.5 series are not binary compatible - you have to
> recompile your code.
>
> If you did recompile, you may be getting version confusion on the backend
> nodes - you should check your ld_library_path and ensure it is pointing to
> the 1.5 series install.
>
> On Dec 8, 2010, at 8:02 AM, Nguyen Toan wrote:
>
> > Dear all,
> >
> > I am having a problem while running mpirun in OpenMPI 1.5 version. I
> compiled OpenMPI 1.5 with BLCR 0.8.2 and OFED 1.4.1 as follows:
> >
> > ./configure \
> > --with-ft=cr \
> > --enable-mpi-threads \
> > --with-blcr=/home/nguyen/opt/blcr \
> > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \
> > --prefix=/home/nguyen/opt/openmpi-1.5 \
> > --with-openib \
> > --enable-mpirun-prefix-by-default
> >
> > For programs under "openmpi-1.5/examples" folder, mpirun tests were
> successful. But mpirun aborted immediately when running a program in MPI
> CUDA code, which was tested successfully with OpenMPI 1.4.3. Below is the
> error message.
> >
> > Can anyone give me an idea about this error?
> > Thank you.
> >
> > Best Regards,
> > Toan
> > ----------------------
> >
> >
> > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
> past end of buffer in file util/nidmap.c at line 371
> >
> --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > orte_ess_base_build_nidmap failed
> > --> Returned value Data unpack would read past end of buffer (-26)
> instead of ORTE_SUCCESS
> >
> --------------------------------------------------------------------------
> > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
> past end of buffer in file base/ess_base_nidmap.c at line 62
> > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
> past end of buffer in file ess_env_module.c at line 173
> >
> --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > orte_ess_set_name failed
> > --> Returned value Data unpack would read past end of buffer (-26)
> instead of ORTE_SUCCESS
> >
> --------------------------------------------------------------------------
> > [rc002.local:17727] [[56831,1],1] ORTE_ERROR_LOG: Data unpack would read
> past end of buffer in file runtime/orte_init.c at line 132
> >
> --------------------------------------------------------------------------
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> environment
> > problems. This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> > ompi_mpi_init: orte_init failed
> > --> Returned "Data unpack would read past end of buffer" (-26) instead
> of "Success" (0)
> >
> --------------------------------------------------------------------------
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > [rc002.local:17727] Abort before MPI_INIT completed successfully; not
> able to guarantee that all other processes were killed!
> >
> --------------------------------------------------------------------------
> > mpirun has exited due to process rank 1 with PID 17727 on
> > node rc002 exiting improperly. There are two reasons this could occur:
> >
> > 1. this process did not call "init" before exiting, but others in
> > the job did. This can cause a job to hang indefinitely while it waits
> > for all processes to call "init". By rule, if one process calls "init",
> > then ALL processes must call "init" prior to termination.
> >
> > 2. this process called "init", but exited without calling "finalize".
> > By rule, all processes that call "init" MUST call "finalize" prior to
> > exiting or it will be considered an "abnormal termination"
> >
> > This may have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> --------------------------------------------------------------------------
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>