Can you try with the current trunk head (r24296)?
I just committed a fix for the C/R functionality in which restarts were getting stuck. This will likely affect the migration functionality, but I have not had an opportunity to test just yet.
Another thing to check is that prelink is turned off on all of your machines.
Let me know if the problem persists, and I'll dig into a bit more.
On Jan 24, 2011, at 11:37 AM, Hugo Meyer wrote:
> Hello @ll
> I've got a problem when i try to use the ompi-migrate command.
> What i'm doing is execute for example the next application in one node of a cluster (both process wil run on the same node):
> mpirun -np 2 -am ft-enable-cr ./whoami 10 10
> Then in the same node i try to migrate the processes to another node:
> ompi-migrate -x node9 -t node3 14914
> And then i get this message:
> [clus9:15620] *** Process received signal ***
> [clus9:15620] Signal: Segmentation fault (11)
> [clus9:15620] Signal code: Address not mapped (1)
> [clus9:15620] Failing at address: (nil)
> [clus9:15620] [ 0] /lib64/libpthread.so.0 [0x2aaaac0b8d40]
> [clus9:15620] *** End of error message ***
> Segmentation fault
> I assume that maybe there is something wrong with the thread level, but i have configured the open-mpi like this:
> ../configure --prefix=/home/hmeyer/desarrollo/ompi-code/binarios/ --enable-debug --enable-debug-symbols --enable-trace --with-ft=cr --disable-ipv6 --enable-opal-multi-threads --enable-ft-thread --without-hwloc --disable-vt --with-blcr=/soft/blcr-0.8.2/ --with-blcr-libdir=/soft/blcr-0.8.2/lib/
> The checkpoint and restart works fine, but when i restore an application that has more than one process, this one is restored and executed until the last line before MPI_FINALIZE(), but the processes never finalize, i assume that they never call the MPI_FINALIZE(), but with one process ompi-checkpoint and ompi-restart work great.
> Best regards.
> Hugo Meyer
> devel mailing list
Postdoctoral Research Associate
Oak Ridge National Laboratory