I'm afraid that recursive calls to mpirun are not supported. The
problem is that mpirun sets environmental params when launching
processes - if that launched process in turn calls mpirun, then those
params cause mpirun to break.
We have at times considered addinga --recursive option to mpirun that
would try to resolve this problem, but there unfortunately isn't any
good way to do it. We can't know what MCA params the user might have
set versus what mpirun sets itself. The only way to resolve it would
be for us to prefix params set by mpirun so they could be
distinguished from those set by users - but that opens another set of
problems that are just as nasty as what we were trying to fix.
So for now, you can't do what you described. Your better option is use
MPI_Comm_spawn for subsequent launches, if that can meet your needs.
Otherwise, you might try to reorganize your application to avoid
recursively calling mpirun. Either of those two approaches has proven
to work in the past - hopefully, one will work for you too.
On Jun 6, 2009, at 11:47 AM, Carlos Henrique da Silva Santos wrote:
> I developed one application using openmpi in c++. This application
> start internally (by system call) another application which is also
> developed in c++ and openmpi. When this external application is
> called with
> C system function the following messages are showed:
> [localhost.localdomain:05275] 00B: Connectio to HNP lost
> [localhost.localdomain:05276] 00B: Connectio to HNP lost
> [localhost.localdomain:05277] 00B: Connectio to HNP lost
> Please, could someone explain what is happening in this case?
> Carlos Santos
> users mailing list