Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)
From: Joshua Hursey (jjhursey_at_[hidden])
Date: 2010-03-03 16:34:07


On Mar 3, 2010, at 3:42 PM, Fernando Lemos wrote:

> On Wed, Mar 3, 2010 at 5:31 PM, Joshua Hursey <jjhursey_at_[hidden]> wrote:
> <snip>
>>
>> Yes, ompi-restart should be printing a helpful message and exiting normally. Thanks for the bug report. I believe that I have seen and fixed this on a development branch making its way to the trunk. I'll make sure to move the fix to the 1.4 series once it has been applied to the trunk.
>>
>> I filed a ticket on this if you wanted to track the issue.
>> https://svn.open-mpi.org/trac/ompi/ticket/2329
>
> Ah, that's great. Just wondering, do you have any idea why blcr-util
> is required? That package only contains the cr_* binaries (cr_restart,
> cr_checkpoint, cr_run) and some docs (manpages, changelog, etc.). I've
> filled a Debian bug (#572229) about making openmpi-checkpoint depend
> on blcr-util, but the package maintainer told me he found it unusual
> that ompi-restart would depend on the cr_* binaries since libcr
> supposedly provides all the functionality ompi-restart needs.
>
> I'm about to compile OpenMPI in debug mode and take a look at the
> backtrace to see if I can understand what's going on.
>
> Btw, this is the list of files in the blcr-util package:
> http://packages.debian.org/sid/amd64/blcr-util/filelist . As you can
> see, only cr_* binaries and docs.

Open MPI currently calls 'cr_restart' for each process it restarts, exec'ed from the 'opal-restart' binary (LAM/MPI also used cr_restart directly, in case anyone is interested). We use the internal library interface for checkpoint, but not restarting at this time.

If I recall correctly, it wasn't until relatively recently that BLCR added the ability to restart a process from a library call. We have not put in the code to use this functionality (though all of the framework interfaces are in place to do so). On my development branch I will add the ability to use the BLCR library interface if available. That functionality will not likely make it to the v1.4 release series since it is not really a bug fix, but I will plan on including it in the v1.5 and later releases. And just so I don't lose track of it, I created an enhancement ticket for this:
  https://svn.open-mpi.org/trac/ompi/ticket/2330

Cheers,
Josh

>
>>
>> Thanks again,
>> Josh
>
> Thank you!
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users