The openib BTL and BLCR support in Open MPI were working about a year ago (when I last checked). The psm BTL is not supported at the moment though.

From the error, I suspect that we are not fully closing the openib btl driver before the checkpoint thus when we try to restart it is looking for a resource that is no longer present. I created a ticket for us to investigate further if you want to follow it:
  https://svn.open-mpi.org/trac/ompi/ticket/3417

Unfortunately, I do not know who is currently supporting that code path (I might pick it back up at some point, but cannot promise anything in the near future). But I will keep an eye on the ticket and see what I can do. If it is what I think it is, then it should not take too much work to get it working again.

-- Josh

On Wed, Nov 28, 2012 at 5:14 AM, William Hay <w.hay@ucl.ac.uk> wrote:
I'm trying to build openmpi with support for BLCR plus qlogic infiniband (plus grid engine).  Everything seems to compile OK and checkpoints are taken but whenever I try to restore a checkpoint I get the following error:
- do_mmap(<file>, 00002aaab18c7000, 0000000000001000, ...) failed: ffffffffffffffea
- mmap failed: /dev/ipath
- thaw_threads returned error, aborting. -22
- thaw_threads returned error, aborting. -22
Restart failed: Invalid argument

This occurs whether I specify psm or openib as the btl.

This looks like the sort of thing I would expect to be handled by the blcr supporting code in openmpi.  So I guess I have a couple ofquestions.
1)Are Infiniband and BLCR support in openmpi compatible?
2)Are there any special tricks necessary to get them working together.


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey