Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] *** An error occurred in MPI_Init
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-05-11 09:06:25


What versions of BLCR and Open MPI are you using?

Have you tried to checkpoint/restart a single (non-MPI) application
with BLCR? BLCR ships with some examples, and I would suggest trying
to make sure those work before moving onto Open MPI.

Typically this type of failure is the result of BLCRs cr_init()
failing. Do you happen to see an error message like the following:
   Error: crs:blcr: module_init: cr_init failed

You could also try to see if something more subtle is happening by
turning on verbosity with the following command line switch:
   -mca crs_base_verbose 10

Let me know how those go, and we can keep debugging from there.

-- Josh

On May 8, 2009, at 6:47 PM, Kritiraj Sajadah wrote:

>
> Hi Gus,
>
> Thanks for your email. I have /usr/local/bin included in my
> $PATH. (Not /usr/local/include - it was just a copying mistake).
>
> I checked where mpicc and mpirun are and i got the following path
>
> /usr/local/bin/mpirun
> /usr/local/bin/mpicc
>
> The BLCR I am using was downloaded and installed seperately.
>
> 1) Do you think i may be using the wrong version of BLCR?.
> There is a directory called blcr within the openmpi tarball
> (openmpi-1.3/opal/mca/crs/blcr). Should I use this?
>
> 2) DO you think it's better to install openmpi in /usr/local/openmpi
> and blcr in/usr/local/blcr?
>
> 3) If so, how do i uninstall the one i have already?
>
> Thank you
>
> Kritiraj
>
>
>
> --- On Fri, 5/8/09, Gus Correa <gus_at_[hidden]> wrote:
>
>> From: Gus Correa <gus_at_[hidden]>
>> Subject: Re: [OMPI users] *** An error occurred in MPI_Init
>> To: "Open MPI Users" <users_at_[hidden]>
>> Date: Friday, May 8, 2009, 6:33 PM
>> PS - Kritiraj
>>
>> Reading your message more carefully, I saw that you did
>> this:
>>
>> ****
>> Open the $HOME/.bashrc and added the following:
>>
>> PATH="/usr/local/include:$PATH"
>> LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
>>
>> ****
>>
>> However, this is what you should have done:
>>
>> ****
>> Open the $HOME/.bashrc and added the following:
>>
>> PATH="/usr/local/bin:$PATH"
>> LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
>>
>> ****
>>
>> Note that /usr/local/bin, not /usr/local/include should be
>> pre-pended to your PATH!
>>
>>
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>> Gus Correa wrote:
>>> Hi Kritiraj
>>>
>>> This looks like as many other errors reported on this
>> list
>>> that are caused by using the wrong MPI compiler
>> wrappers
>>> or the wrong mpirun/mpiexec.
>>> Typically this is caused by a PATH environment
>> variable that
>>> is pointing to the wrong executables (mpicc, mpirun).
>>> Most Linux distributions, compilers, etc, come with
>> their
>>> own MPI versions, and this can be very confusing.
>>>
>>> Try using full path names for mpicc and for mpirun.
>>> That is bullet proof method to get exactly what you
>> want.
>>> In your case use /usr/local/bin (as you configured
>> with --prefix=/usr/local).
>>> (Actually, I prefer to configure with a more
>> distinctive
>>> name to the prefix, something like
>> /usr/local/openmpi-1.3.2,
>>> to avoid any confusion with other MPIs.)
>>>
>>> You can also try "which mpicc" and "which mpirun",
>>> or "mpicc --showme" and "mpirun --help" to get a bit
>> more
>>> information about what you are really using.
>>>
>>> I hope this helps.
>>> Gus Correa
>>>
>> ---------------------------------------------------------------------
>>> Gustavo Correa
>>> Lamont-Doherty Earth Observatory - Columbia
>> University
>>> Palisades, NY, 10964-8000 - USA
>>>
>> ---------------------------------------------------------------------
>>>
>>>
>>> Kritiraj Sajadah wrote:
>>>> Dear All,
>>>> I
>> have install and configured openmpi with BLCR on my laptop:
>>>>
>>>> 1) configure and install blcr
>>>>
>>>> ./configure --prefix=/usr/local/
>> --enable-debug=yes --enable-libcr-tracing=yes
>> --enable-kernel-tracing=yes --enable-testsuite=yes
>> --enable-all-static=yes --enable-static=yes
>>>>
>>>> make
>>>> make install
>>>>
>>>> 2) configure and install openmpi
>>>>
>>>> ./configure --prefix=/usr/local/ --enable-picky
>> --enable-debug --enable-mpi-profile --enable-mpi-cxx
>> --enable-pretty-print-stacktrace --enable-binaries
>> --enable-trace --enable-static=yes --enable-debug
>> --with-devel-headers=1 --with-mpi-param-check=always
>> --with-ft=cr --enable-ft-thread --with-blcr=/usr/local/
>> --with-blcr-libdir=/usr/local/lib --enable-mpi-threads=yes
>>>>
>>>> make all install
>>>>
>>>> 3) add the environment variables.
>>>>
>>>>
>>>> Open the $HOME/.bashrc and added the following:
>>>>
>>>> PATH="/usr/local/include:$PATH"
>>>> LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
>>>>
>>>> Now the problem:
>>>>
>>>> I am trying to checkpoint the following MPI
>> application:
>>>>
>>>> #include <stdio.h>
>>>> #include <mpi.h>
>>>>
>>>> main(int argc, char **argv)
>>>> {
>>>> int node;
>>>>
>> MPI_Init(&argc,&argv);
>>>> MPI_Comm_rank(MPI_COMM_WORLD,
>> &node);
>>>>
>> printf("Hello World from Node
>> %d\n",node);
>>>>
>> MPI_Finalize();
>>>> }
>>>>
>>>> I am running mpirun as follows:
>>>>
>>>> raj-laptop> mpirun -am ft-enable-cr
>> helloworld.
>>>>
>>>> The errors are as follows:
>>>>
>>>>
>> --------------------------------------------------------------------------
>>
>>>> It looks like opal_init failed for some reason;
>> your parallel process is
>>>> likely to abort. There are many reasons that
>> a parallel process can
>>>> fail during opal_init; some of which are due to
>> configuration or
>>>> environment problems. This failure appears
>> to be an internal failure;
>>>> here's some additional information (which may only
>> be relevant to an
>>>> Open MPI developer):
>>>>
>>>> opal_cr_init() failed failed
>>>> --> Returned value -1 instead
>> of OPAL_SUCCESS
>>>>
>> --------------------------------------------------------------------------
>>
>>>> *** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now
>> abort)
>>>> [raj-laptop:9439] Abort before MPI_INIT completed
>> successfully; not able to guarantee that all other processes
>> were killed!
>>>> [raj-laptop:09439] [[INVALID],INVALID]
>> ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line
>> 77
>>>>
>> --------------------------------------------------------------------------
>>
>>>> It looks like MPI_INIT failed for some reason;
>> your parallel process is
>>>> likely to abort. There are many reasons that
>> a parallel process can
>>>> fail during MPI_INIT; some of which are due to
>> configuration or environment
>>>> problems. This failure appears to be an
>> internal failure; here's some
>>>> additional information (which may only be relevant
>> to an Open MPI
>>>> developer):
>>>>
>>>> ompi_mpi_init: orte_init failed
>>>> --> Returned "Error" (-1)
>> instead of "Success" (0)
>>>>
>> --------------------------------------------------------------------------
>>
>>>>
>>>> Is it something to do with me running it on a
>> single node; i.e my laptop? or is it something to do with
>> configurations or libraries?
>>>>
>>>>
>>>> Any help will be very appreciated.
>>>>
>>>> Regards,
>>>>
>>>> Raj
>>>>
>>>>
>>>>
>>>>
>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users