I am trying to add a host at run time and spawn a slave process.
The slave process starts but hangs or crashes in MPI_Init().
Code for the slave process is
#include <admodel.h>
int main(int argc,char * argv[])
{
ofstream ofs("junk11");
ofs << "calling MPI_Init" << endl;
int err=MPI_Init(&argc,&argv);
ofs << "returned MPI_Init err = " << err << endl;
}
I can run the slave process via ssh as
ssh smudge ./mpitest
and the file junk11 then contains
calling MPI_Init
returned MPI_Init err = 0
However if I try to remotely spawn it then junk11 contains
only the line before the call to MPI_Init
calling MPI_Init
and the spawned process appears to have crashed.
The master process hangs at the spawn command.
The code to spawn the remote process is
MPI_Info infotest;
int ierr2=MPI_Info_create(&infotest);
MPI_Info_set( infotest, "add-hostfile", "/home/dave/hostfile" );
MPI_Info_set( infotest, "host", "smudge" );
int localerr=MPI_Comm_spawn("mpitest", NULL, 1,
infotest, 0, MPI_COMM_SELF, &everyone,
&(ierr(1)) );
If I change the line above to
MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,
Then mpitest is successfully spawned on the local machine.
Note that I am not using mpirun.
ompi_info output is identical for both machines
ompi_info -v ompi full --parsable
package:Open MPI dave_at_scum Distribution
ompi:version:full:1.5.4
ompi:version:svn:r25060
ompi:version:release_date:Aug 18, 2011
orte:version:full:1.5.4
orte:version:svn:r25060
orte:version:release_date:Aug 18, 2011
opal:version:full:1.5.4
opal:version:svn:r25060
opal:version:release_date:Aug 18, 2011
ident:1.5.4
How can I find out what is happening to the remote spawned process?
|