We've had a few reports of this - it looks like someone made a change to R that can cause problems. Basically, the open fabrics driver for Infiniband doesn't support "fork" operations - it can lead to memory corruption issues if you inadvertently do the "wrong thing" at some point after the fork. Hence, we emit a warning if we see a "fork" operation when Infiniband is being used with the OFED verbs driver.
You can suppress the warning by setting -mca mpi_warn_on_fork 0 on your cmd line. You will probably be okay, but just be aware you could hit issues.
On May 16, 2012, at 6:17 AM, Jim Maas wrote:
> I'm getting the following error with a new version of R, using Rmpi and a few other modules. I've already had a couple of good suggestions from this group about how to diagnose the cause of the fork error using "strace" but we don't have it on our LSF Linux cluster. This is my first use of R/mpi/parallel etc so am a bit naive. Also the code I'm running involves random number generation so will always give slightly different answers.
> My normal routine is to :
> a) try the code with a small number of iterations on my own Linux/R/open-mpi pc using 8 cores, then
> b) make the job bigger and run it to the cluster.
> I only get the warning on the cluster which suggests that it caused by something related to R and/or Rmpi and/or LSF and/or open MPI ???
> Could someone suggest some rigorous R test-code that I could run on my pc, ok if it takes some time, and then rerun it on cluster to confirm that I get the same results, and thus the warning in inconsequential?
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process. Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption. The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
> The process that invoked fork was:
> Local host: cn159.private.dns.zone (PID 12792)
> MPI_COMM_WORLD rank: 7
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> Dr. Jim Maas
> University of East Anglia
> users mailing list