Open MPI logo

Docs Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Docs mailing list

Subject: Re: [OMPI docs] help me!
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-21 14:22:28


This is more of a question for the user's list; you probably want to
re-post it there.

Also note that the checkpoint-restart capabilities are in pre-release
v1.3 snapshot tarballs only; as such, it's in active development.

On Jun 21, 2008, at 2:16 PM, Yen Phi wrote:

> Hi all,
> I run my job with OpenMPI and then checkpint it, it checkpoint when
> my job end. When I try to restart it, it notifies me that message. I
> don't know why. Please help me.
> [root_at_localhost ~]# mpirun -np 4 -am ft-enable-cr hello
> [root_at_localhost ~]# ompi-checkpoint 19632
> Snapshot Ref.: 0 ompi_global_snapshot_19632.ckpt
> [root_at_localhost ~]# ompi-restart ompi_global_snapshot_19632.ckpt
> [localhost:19649] *** Process received signal ***
> [localhost:19649] Signal: Segmentation fault (11)
> [localhost:19649] Signal code: Address not mapped (1)
> [localhost:19649] Failing at address: 0x1
> [localhost:19649] [ 0] [0x110440]
> [localhost:19649] [ 1] /usr/local/lib/libopen-rte.so.
> 0(orte_rmaps_base_claim_slot+0x17b) [0x15db1f]
> [localhost:19649] [ 2] /usr/local/lib/openmpi/
> mca_rmaps_round_robin.so [0x23cb84]
> [localhost:19649] [ 3] /usr/local/lib/openmpi/
> mca_rmaps_round_robin.so [0x23d3ae]
> [localhost:19649] [ 4] /usr/local/lib/libopen-rte.so.
> 0(orte_rmaps_base_map_job+0x105) [0x15c61d]
> [localhost:19649] [ 5] /usr/local/lib/libopen-rte.so.
> 0(orte_plm_base_setup_job+0xd3) [0x156077]
> [localhost:19649] [ 6] /usr/local/lib/openmpi/mca_plm_rsh.so
> [0x1fecc3]
> [localhost:19649] [ 7] mpirun [0x804a79d]
> [localhost:19649] [ 8] mpirun [0x8049e76]
> [localhost:19649] [ 9] /lib/libc.so.6(__libc_start_main+0xe0)
> [0x9a0390]
> [localhost:19649] [10] mpirun [0x8049da1]
> [localhost:19649] *** End of error message ***
> Segmentation fault
> Thanks
> Yen
> _______________________________________________
> docs mailing list
> docs_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/docs
>

-- 
Jeff Squyres
Cisco Systems