Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi-restart, ompi-ps problem
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2010-07-16 17:45:56


(Sorry for the late reply)

On Jun 7, 2010, at 4:48 AM, Nguyen Kim Son wrote:

> Hello,
>
> I'n trying to get functions like orte-checkpoint, orte-restart,... works but there are some errors that I don't have any clue about.
>
> Blcr (0.8.2) works fine apparently and I have installed openmpi 1.4.2 from source with option blcr.
> The command
> mpirun -np 4 -am ft-enable-cr ./checkpoint_test
> seemed OK but
> orte-checkpoint --term PID_of_checkpoint_test ( obtaining after ps -ef | grep mpirun )
> does not return and shows nothing like errors!

You mean the PID of 'mpirun', right?

Does it checkpoint correctly without the '--term' argument?

Can you try the v1.5 release candidate to see if you have the same problem?
  http://www.open-mpi.org/software/ompi/v1.5/

What MCA parameters do you have set in your environment?

-- Josh

>
> Then, I checked with
> ompi-ps
> this time, I obtain:
> oob-tcp: Communication retries exceeded. Can not communicate with peer
>
> Does anyone has the same problem?
> Any idea is welcomed!
> Thanks,
> Son.
>
>
> --
> ---------------------------------------------------------
> Son NGUYEN KIM
> Antibes 06600
> Tel: 06 48 28 37 47
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users