On Tue, 2009-05-19 at 08:29 -0400, Jeff Squyres wrote:
> fork() support in OpenFabrics has always been dicey -- it can lead to
> random behavior like this. Supposedly it works in a specific set of
> circumstances, but I don't have a recent enough kernel on my machines
> to test.
> It's best not to use calls to system() if they can be avoided.
> Indeed, Open MPI v1.3.x will warn you if you create a child process
> after MPI_INIT when using OpenFabrics networks.
My C++ OMPI program uses system() to invoke an external mesh partitioner
program after MPI_INIT is called. Sometimes (with frustrating
randomness), on systems using OFED the system() call fails with EFAULT
(Bad address). The linux kernel appears to feel that the execve()
function is being passed a string which isn't in the process' address
space. The exec string is constructed immediately before calling
system() like this:
ss << "partitioner_program " << COMM_WORLD_SIZE;
system( ss.str().c_str() );
Could this behavior related to this admonition?
Also, would MPI_COMM_SPAWN suffer from the same difficulties?