Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Checkpoint/Restart error
From: Andreea Costea (andre.costea_at_[hidden])
Date: 2010-01-15 08:07:02


I tried the new version, that was uploaded today. I still have that error,
just that now is at line 405 instead of 399.

Maybe if I give more details:
- I first had OpenMPI version 1.3.3 with BLCR installed: mpirun,
ompi-checkpoint and ompi-restart worked with that version.
- I wanted to update to version 1.4.1 and I uninstalled previous version
like this: make uninstall, and than manually deleted all the left over
files. the directory where I installed was /usr/local
- I installed 1.4.1 in the same directory: /usr/locale. paths set correctly
to usr/local/bin and /usr/local/lib
- mpirun works, ompi-checkpoint gives the following error:
[[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line
405
HNP with PID 7899 Not found!

I would appreciate any help,
Andreea

On Fri, Jan 15, 2010 at 1:15 PM, Andreea Costea <andre.costea_at_[hidden]>wrote:

> Hi...
> still not working. Though I uninstalled OpenMPI with make uninstall and I
> manually deleted all other files, I still have the same error when
> checkpointing.
>
> Any idea?
>
> Thanks,
> Andreea
>
>
>
> On Thu, Jan 14, 2010 at 10:38 PM, Joshua Hursey <jjhursey_at_[hidden]>wrote:
>
>> On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote:
>>
>> > Hi,
>> >
>> > I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have
>> downloaded today. When I want to checkpoint I am having the following error
>> message:
>> > [[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at
>> line 399
>> > HNP with PID 2337 Not found!
>>
>> This looks like an error coming from the 1.3.3 install. In 1.4.1 there is
>> no error at line 399, in 1.3.3 there is. Check your installation of Open
>> MPI, I bet you are mixing 1.4.1 and 1.3.3, which can cause unexpected
>> problems.
>>
>> Try a clean installation of 1.4.1 and double check that 1.3.3 is not in
>> your path/lib_path any longer.
>>
>> -- Josh
>>
>> >
>> > I tried the same thing with version 1.3.3 and it works perfectly.
>> >
>> > Any idea why?
>> >
>> > thanks,
>> > Andreea
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>