Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] *** An error occurred in MPI_Init
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-05-11 09:06:25


What versions of BLCR and Open MPI are you using?

Have you tried to checkpoint/restart a single (non-MPI) application
with BLCR? BLCR ships with some examples, and I would suggest trying
to make sure those work before moving onto Open MPI.

Typically this type of failure is the result of BLCRs cr_init()
failing. Do you happen to see an error message like the following:
   Error: crs:blcr: module_init: cr_init failed

You could also try to see if something more subtle is happening by
turning on verbosity with the following command line switch:
   -mca crs_base_verbose 10

Let me know how those go, and we can keep debugging from there.

-- Josh

On May 8, 2009, at 6:47 PM, Kritiraj Sajadah wrote:

>
> Hi Gus,
>
> Thanks for your email. I have /usr/local/bin included in my
> $PATH. (Not /usr/local/include - it was just a copying mistake).
>
> I checked where mpicc and mpirun are and i got the following path
>
> /usr/local/bin/mpirun
> /usr/local/bin/mpicc
>
> The BLCR I am using was downloaded and installed seperately.
>
> 1) Do you think i may be using the wrong version of BLCR?.
> There is a directory called blcr within the openmpi tarball
> (openmpi-1.3/opal/mca/crs/blcr). Should I use this?
>
> 2) DO you think it's better to install openmpi in /usr/local/openmpi
> and blcr in/usr/local/blcr?
>
> 3) If so, how do i uninstall the one i have already?
>
> Thank you
>
> Kritiraj
>
>
>
> --- On Fri, 5/8/09, Gus Correa <gus_at_[hidden]> wrote:
>
>> From: Gus Correa <gus_at_[hidden]>
>> Subject: Re: [OMPI users] *** An error occurred in MPI_Init
>> To: "Open MPI Users" <users_at_[hidden]>
>> Date: Friday, May 8, 2009, 6:33 PM
>> PS - Kritiraj
>>
>> Reading your message more carefully, I saw that you did
>> this:
>>
>> ****
>> Open the $HOME/.bashrc and added the following:
>>
>> PATH="/usr/local/include:$PATH"
>> LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
>>
>> ****
>>
>> However, this is what you should have done:
>>
>> ****
>> Open the $HOME/.bashrc and added the following:
>>
>> PATH="/usr/local/bin:$PATH"
>> LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
>>
>> ****
>>
>> Note that /usr/local/bin, not /usr/local/include should be
>> pre-pended to your PATH!
>>
>>
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>> Gus Correa wrote:
>>> Hi Kritiraj
>>>
>>> This looks like as many other errors reported on this
>> list
>>> that are caused by using the wrong MPI compiler
>> wrappers
>>> or the wrong mpirun/mpiexec.
>>> Typically this is caused by a PATH environment
>> variable that
>>> is pointing to the wrong executables (mpicc, mpirun).
>>> Most Linux distributions, compilers, etc, come with
>> their
>>> own MPI versions, and this can be very confusing.
>>>
>>> Try using full path names for mpicc and for mpirun.
>>> That is bullet proof method to get exactly what you
>> want.
>>> In your case use /usr/local/bin (as you configured
>> with --prefix=/usr/local).
>>> (Actually, I prefer to configure with a more
>> distinctive
>>> name to the prefix, something like
>> /usr/local/openmpi-1.3.2,
>>> to avoid any confusion with other MPIs.)
>>>
>>> You can also try "which mpicc" and "which mpirun",
>>> or "mpicc --showme" and "mpirun --help" to get a bit
>> more
>>> information about what you are really using.
>>>
>>> I hope this helps.
>>> Gus Correa
>>>
>> ---------------------------------------------------------------------
>>> Gustavo Correa
>>> Lamont-Doherty Earth Observatory - Columbia
>> University
>>> Palisades, NY, 10964-8000 - USA
>>>
>> ---------------------------------------------------------------------
>>>
>>>
>>> Kritiraj Sajadah wrote:
>>>> Dear All,
>>>> I
>> have install and configured openmpi with BLCR on my laptop:
>>>>
>>>> 1) configure and install blcr
>>>>
>>>> ./configure --prefix=/usr/local/
>> --enable-debug=yes --enable-libcr-tracing=yes
>> --enable-kernel-tracing=yes --enable-testsuite=yes
>> --enable-all-static=yes --enable-static=yes
>>>>
>>>> make
>>>> make install
>>>>
>>>> 2) configure and install openmpi
>>>>
>>>> ./configure --prefix=/usr/local/ --enable-picky
>> --enable-debug --enable-mpi-profile --enable-mpi-cxx
>> --enable-pretty-print-stacktrace --enable-binaries
>> --enable-trace --enable-static=yes --enable-debug
>> --with-devel-headers=1 --with-mpi-param-check=always
>> --with-ft=cr --enable-ft-thread --with-blcr=/usr/local/
>> --with-blcr-libdir=/usr/local/lib --enable-mpi-threads=yes
>>>>
>>>> make all install
>>>>
>>>> 3) add the environment variables.
>>>>
>>>>
>>>> Open the $HOME/.bashrc and added the following:
>>>>
>>>> PATH="/usr/local/include:$PATH"
>>>> LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
>>>>
>>>> Now the problem:
>>>>
>>>> I am trying to checkpoint the following MPI
>> application:
>>>>
>>>> #include <stdio.h>
>>>> #include <mpi.h>
>>>>
>>>> main(int argc, char **argv)
>>>> {
>>>> int node;
>>>>
>> MPI_Init(&argc,&argv);
>>>> MPI_Comm_rank(MPI_COMM_WORLD,
>> &node);
>>>>
>> printf("Hello World from Node
>> %d\n",node);
>>>>
>> MPI_Finalize();
>>>> }
>>>>
>>>> I am running mpirun as follows:
>>>>
>>>> raj-laptop> mpirun -am ft-enable-cr
>> helloworld.
>>>>
>>>> The errors are as follows:
>>>>
>>>>
>> --------------------------------------------------------------------------
>>
>>>> It looks like opal_init failed for some reason;
>> your parallel process is
>>>> likely to abort. There are many reasons that
>> a parallel process can
>>>> fail during opal_init; some of which are due to
>> configuration or
>>>> environment problems. This failure appears
>> to be an internal failure;
>>>> here's some additional information (which may only
>> be relevant to an
>>>> Open MPI developer):
>>>>
>>>> opal_cr_init() failed failed
>>>> --> Returned value -1 instead
>> of OPAL_SUCCESS
>>>>
>> --------------------------------------------------------------------------
>>
>>>> *** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now
>> abort)
>>>> [raj-laptop:9439] Abort before MPI_INIT completed
>> successfully; not able to guarantee that all other processes
>> were killed!
>>>> [raj-laptop:09439] [[INVALID],INVALID]
>> ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line
>> 77
>>>>
>> --------------------------------------------------------------------------
>>
>>>> It looks like MPI_INIT failed for some reason;
>> your parallel process is
>>>> likely to abort. There are many reasons that
>> a parallel process can
>>>> fail during MPI_INIT; some of which are due to
>> configuration or environment
>>>> problems. This failure appears to be an
>> internal failure; here's some
>>>> additional information (which may only be relevant
>> to an Open MPI
>>>> developer):
>>>>
>>>> ompi_mpi_init: orte_init failed
>>>> --> Returned "Error" (-1)
>> instead of "Success" (0)
>>>>
>> --------------------------------------------------------------------------
>>
>>>>
>>>> Is it something to do with me running it on a
>> single node; i.e my laptop? or is it something to do with
>> configurations or libraries?
>>>>
>>>>
>>>> Any help will be very appreciated.
>>>>
>>>> Regards,
>>>>
>>>> Raj
>>>>
>>>>
>>>>
>>>>
>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users