Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Checkpoint/Restart error
From: Andreea Costea (andre.costea_at_[hidden])
Date: 2010-01-25 05:48:39


So? anyone? any clue?

Summarize:
- installed OpenMPI 1.4.1 on fresh Centos 5
- mpirun works but ompi-checkpoint throws this error:
ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405
- on another VM I have OpenMPI 1.3.3. installed. Checkpointing works fine on
guest but has the previous mentioned error on root. Both root and guest show
the same output after "param -all -all" except for the $HOME (which only
matters for mca_component_path, mca_param_files,
snapc_base_global_snapshot_dir)

Thanks,
Andreea

On Tue, Jan 19, 2010 at 9:01 PM, Andreea Costea <andre.costea_at_[hidden]>wrote:

> I noticed one more thing. As I still have some VMs that have OpenMPI
> version 1.3.3 installed I started to use those machines 'till I fix the
> problem with 1.4.1 And while checkpointing on one of this VMs I realized
> that checkpointing as a guest works fine and checkpointing as a root outputs
> the same error like in 1.4.1. : ORTE_ERROR_LOG: Not found in file
> orte-checkpoint.c at line 405
>
> I logged the outputs of "ompi_info --param all all" which I run for root
> and for another user and the only differences were at these parameters:
>
> mca_component_path
> mca_param_files
> snapc_base_global_snapshot_dir
>
> All 3 params differ because of the $HOME.
> One more thing: I don't have the directory $HOME/.openmpi
>
> Ideas?
>
> Thanks,
> Andreea
>
>
>
>
>
> On Tue, Jan 19, 2010 at 12:51 PM, Andreea Costea <andre.costea_at_[hidden]>wrote:
>
>> Well... I decided to install a fresh OS to be sure that there is no
>> OpenMPI version conflict. So I formatted one of my VMs, did a fresh CentOS
>> install, installed BLCR 0.8.2 and OpenMPI 1.4.1 and the result: the same.
>> mpirun works but ompi-checkpoint has that error at line 405:
>>
>> [[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
>> 405
>>
>> As for the files remaining after uninstalling: Jeff you were rigth. There
>> is no file left, just some empty directories.
>>
>> Which might be the problem with that ORTE_ERROR_LOG error?
>>
>> Thanks,
>> Andreea
>>
>> On Fri, Jan 15, 2010 at 11:47 PM, Andreea Costea <andre.costea_at_[hidden]>wrote:
>>
>>> It's almost midnight here, so I left home, but I will try it tomorrow.
>>> There were some directories left after "make uninstall". I will give more
>>> details tomorrow.
>>>
>>> Thanks Jeff,
>>> Andreea
>>>
>>>
>>> On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres <jsquyres_at_[hidden]>wrote:
>>>
>>>> On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote:
>>>>
>>>> > - I wanted to update to version 1.4.1 and I uninstalled previous
>>>> version like this: make uninstall, and than manually deleted all the left
>>>> over files. the directory where I installed was /usr/local
>>>>
>>>> I'll let Josh answer your CR questions, but I did want to ask about this
>>>> point. AFAIK, "make uninstall" removes *all* Open MPI files. For example:
>>>>
>>>> -----
>>>> [7:25] $ cd /path/to/my/OMPI/tree
>>>> [7:25] $ make install > /dev/null
>>>> [7:26] $ find /tmp/bogus/ -type f | wc
>>>> 646 646 28082
>>>> [7:26] $ make uninstall > /dev/null
>>>> [7:27] $ find /tmp/bogus/ -type f | wc
>>>> 0 0 0
>>>> [7:27] $
>>>> -----
>>>>
>>>> I realize that some *directories* are left in $prefix, but there should
>>>> be no *files* left. Are you seeing something different?
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>
>