Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Checkpoint/Restart error
From: Andreea Costea (andre.costea_at_[hidden])
Date: 2010-01-15 09:21:24


I don't know what else should I try... because it worked on 1.3.3 doing
exactly the same steps. I tried to install it both with an active eth
interface and an inactive one. I am running on a virtual machine that has
CentOS as OS.

Any suggestions?

Thanks,
Andreea

On Fri, Jan 15, 2010 at 9:07 PM, Andreea Costea <andre.costea_at_[hidden]>wrote:

> I tried the new version, that was uploaded today. I still have that error,
> just that now is at line 405 instead of 399.
>
> Maybe if I give more details:
> - I first had OpenMPI version 1.3.3 with BLCR installed: mpirun,
> ompi-checkpoint and ompi-restart worked with that version.
> - I wanted to update to version 1.4.1 and I uninstalled previous version
> like this: make uninstall, and than manually deleted all the left over
> files. the directory where I installed was /usr/local
> - I installed 1.4.1 in the same directory: /usr/locale. paths set
> correctly to usr/local/bin and /usr/local/lib
> - mpirun works, ompi-checkpoint gives the following error:
> [[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
> 405
> HNP with PID 7899 Not found!
>
> I would appreciate any help,
> Andreea
>
>
>
> On Fri, Jan 15, 2010 at 1:15 PM, Andreea Costea <andre.costea_at_[hidden]>wrote:
>
>> Hi...
>> still not working. Though I uninstalled OpenMPI with make uninstall and I
>> manually deleted all other files, I still have the same error when
>> checkpointing.
>>
>> Any idea?
>>
>> Thanks,
>> Andreea
>>
>>
>>
>> On Thu, Jan 14, 2010 at 10:38 PM, Joshua Hursey <jjhursey_at_[hidden]>wrote:
>>
>>> On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote:
>>>
>>> > Hi,
>>> >
>>> > I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have
>>> downloaded today. When I want to checkpoint I am having the following error
>>> message:
>>> > [[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at
>>> line 399
>>> > HNP with PID 2337 Not found!
>>>
>>> This looks like an error coming from the 1.3.3 install. In 1.4.1 there is
>>> no error at line 399, in 1.3.3 there is. Check your installation of Open
>>> MPI, I bet you are mixing 1.4.1 and 1.3.3, which can cause unexpected
>>> problems.
>>>
>>> Try a clean installation of 1.4.1 and double check that 1.3.3 is not in
>>> your path/lib_path any longer.
>>>
>>> -- Josh
>>>
>>> >
>>> > I tried the same thing with version 1.3.3 and it works perfectly.
>>> >
>>> > Any idea why?
>>> >
>>> > thanks,
>>> > Andreea
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>