Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI Checkpoint Restart
From: Neel Sunil Desai (Neel.Desai_at_[hidden])
Date: 2013-06-03 12:34:11


Hi Ralph.

I checked the errors.
I do not understand what the fololowing means : The session directory
location could not be parsed.
       ompi-checkpoint attempted to use the session directory:
         /tmp/openmpi-sessions-ndesai_at_vcainternmpi01_0
I opened the /tmp/openmpi-sessions-ndesai directory and various directories
are created.

Also, when I run the mpi program, I get the following errors before the
program starts running correctly:

[ndesai_at_vcainternmpi01 work]$ mpirun -am ft-enable-cr --np 16 ./DecoderTest
../../decoder/test.ini
[vcainternmpi01:25341] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25342] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25343] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25344] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25347] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25354] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25356] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25337] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25338] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25339] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25340] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25355] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25359] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25357] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25358] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)
[vcainternmpi01:25362] mca: base: component_find: unable to open
/home/ndesai/mpicr/lib/openmpi/mca_crs_blcr: libcr.so.0: cannot open shared
object file: No such file or directory (ignored)

I also checked the mca-params-conf file and all it contained were comments.
Do I have to make any changes there for getting correct snapshots?

Thanks a lot,
Neel.

On Fri, May 31, 2013 at 5:24 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Did you check the items on the list given in the error? I'm no expert on
> ompi-checkpoint, but the error means that one of those conditions isn't
> being met.
>
>
> On May 31, 2013, at 4:54 PM, Neel Sunil Desai <Neel.Desai_at_[hidden]>
> wrote:
>
> Hi Ralph,
>
> Thanks for the help. The path and ld_path were not set to the correct
> location. I was able to execute the ompi-checkpoint command. But, I got the
> following error.
>
> [ndesai_at_vcainternmpi01 ~]$ ompi-checkpoint 1803
> --------------------------------------------------------------------------
> Error: Unable to find the requested, active MPIRUN process on this machine.
> This could be due to one of the following:
> - The jobid specified by the '--hnp-jobid' option is not
> correct.
> - The PID specified (1803) is not that of an active MPIRUN.
> - The application with this PID is not checkpointable
> - The application with this PID is not an Open MPI application.
> - The session directory location could not be parsed.
> ompi-checkpoint attempted to use the session directory:
> /tmp/openmpi-sessions-ndesai_at_vcainternmpi01_0
> Thanks,
> Neel.
>
> On Fri, May 31, 2013 at 4:34 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Check that your path and ld_library_path are set to point to the
>> directory where you installed the version you built (the --prefix=<> you
>> provided).
>>
>> On May 31, 2013, at 4:31 PM, Neel Sunil Desai <Neel.Desai_at_[hidden]>
>> wrote:
>>
>> Hi Ralph,
>>
>> I did install open mpi with the --with-ft=cr option.
>>
>> Thanks,
>> Neel.
>>
>> On Fri, May 31, 2013 at 4:25 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> Okay, it should work it that version. It sounds like you didn't
>>> configure OMPI with the --with-ft=cr option - yes? Take a look at
>>> "./configure -h" for the ft-related options and ensure you build what you
>>> need. C/R support is not built by default.
>>>
>>>
>>> On May 31, 2013, at 3:59 PM, Neel Sunil Desai <Neel.Desai_at_[hidden]>
>>> wrote:
>>>
>>> Open MPI 1.5.4
>>>
>>> On Fri, May 31, 2013 at 3:31 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> What OMPI version?
>>>>
>>>> On May 31, 2013, at 3:17 PM, Neel Sunil Desai <Neel.Desai_at_[hidden]>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I forgot to add. I watched the video of Joshua Hursey and when I type
>>>> ompi_info | grep FT, I get FT Checkpoint Support: no ( checkpoint thread :
>>>> no). I do not get anything when I type ompi_info | grep crs.
>>>> >
>>>> > Thanks,
>>>> > Neel.
>>>> > _______________________________________________
>>>> > users mailing list
>>>> > users_at_[hidden]
>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>
>>>
>>
>>
>
>