Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] "Connection to lifeline lost" when developing a new rsh agent
From: Yann RADENAC (Yann.Radenac_at_[hidden])
Date: 2012-08-20 08:11:04


Hi,

I'm developing MPI support for XtreemOS (www.xtreemos.eu) so that an MPI
program is managed as a single XtreemOS job.
To manage all processes as a single XtreemOS job, I've developed the
program xos-createProcess that plays the role of the rsh agent
(replacing ssh/rsh) to start a process on a remote machine that is part
of the ones reserved for the current job.

I'm running a simple hello world MPI program where each processes sends
a string to the process 0 that itself prints them on standard output.

When using OpenMPI with ssh, this program works perfectly on several
machines.

When using OpenMPI with my launcher xos-createProcess, it works with an
MPI program of 2 processes on 2 different machines.

However I cannot pass through the following error that happens when
running an MPI program of 3 processes on 3 different machines (or any n
processes on n different machines with n >= 3).

A process started by xos-createProcess on a remote machine ends with the
following error:

[paradent-5.rennes.grid5000.fr:08191] [[50627,0],2] routed:binomial:
Connection to lifeline [[50627,0],0] lost

But, process 0 is still running! lifeline should not have been lost!
Actually, process 0 is still waiting for remote process to terminate
(checked with gdb, the initial process is calling libc's poll()).

The run command is:

-bash -c '(mpirun --mca orte_rsh_agent xos-createProcess
--leave-session-attached -np 2 -host `xreservation -a $XOS_RSVID`
mpi/hello_world_MPI < /dev/null > mpirun.out) >& mpirun.err'

Same problem with or without option --leave-session-attached.

So, how is the lifeline implemented? why does it work with 2 processes
but start failing when using 3 or more processes?

I'm using Open MPI 1.6.

Thanks for your help.

-- 
Yann Radenac
Research Engineer, INRIA
Myriads research team, INRIA Rennes - Bretagne Atlantique