Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] remote spawned process hangs at MPI_Init
From: dave fournier (davef_at_[hidden])
Date: 2011-10-15 13:23:49


I am trying to add a host at run time and spawn a slave process.
The slave process starts but hangs or crashes in MPI_Init().
Code for the slave process is

#include <admodel.h>
int main(int argc,char * argv[])
{
   ofstream ofs("junk11");
   ofs << "calling MPI_Init" << endl;
   int err=MPI_Init(&argc,&argv);
   ofs << "returned MPI_Init err = " << err << endl;
}

I can run the slave process via ssh as

     ssh smudge ./mpitest

and the file junk11 then contains

calling MPI_Init
returned MPI_Init err = 0

However if I try to remotely spawn it then junk11 contains
only the line before the call to MPI_Init

calling MPI_Init

and the spawned process appears to have crashed.
The master process hangs at the spawn command.
The code to spawn the remote process is

      MPI_Info infotest;
      int ierr2=MPI_Info_create(&infotest);
      MPI_Info_set( infotest, "add-hostfile", "/home/dave/hostfile" );
      MPI_Info_set( infotest, "host", "smudge" );
      int localerr=MPI_Comm_spawn("mpitest", NULL, 1,
             infotest, 0, MPI_COMM_SELF, &everyone,
&(ierr(1)) );
If I change the line above to

      MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,

Then mpitest is successfully spawned on the local machine.
Note that I am not using mpirun.

ompi_info output is identical for both machines

ompi_info -v ompi full --parsable
package:Open MPI dave_at_scum Distribution
ompi:version:full:1.5.4
ompi:version:svn:r25060
ompi:version:release_date:Aug 18, 2011
orte:version:full:1.5.4
orte:version:svn:r25060
orte:version:release_date:Aug 18, 2011
opal:version:full:1.5.4
opal:version:svn:r25060
opal:version:release_date:Aug 18, 2011
ident:1.5.4

How can I find out what is happening to the remote spawned process?