On Mar 1, 2012, at 10:47 PM, Barnet Wagman wrote:

I've run into a problem upgrading from 1.4.3 to 1.4.4 or 1.4.5

With 1.4.4 and 1.4.5, I'm getting error messages like

[[59597,1],0] routed:binomial: Connection to lifeline [[59597,0],0] lost

The error does not occur if I restrict the host list to localhost.

Basic tests like 'mpirun hello_c' work properly.  The problem occurs using the R package Rmpi package.  (I've tried the R mailing lists, but so far to no avail.) This R package does work reliably with openmpi 1.4.3.

Could some one explain what an error message like this indicates? Is something timing out? Any idea what changed after 1.4.3 that might lead to this kind of problem?

Is the job completing? Usually this message appears because mpirun terminates before everything else does. Only concern I have is that the process that issued your example message is an application process, but I'm assuming it was running local to mpirun - yes?

If the job is completing, then you can just ignore the message. I'm not aware of anything that changed in the 1.4 series that would have impacted termination procedures, and I haven't been seeing this behavior myself (caveat: I don't run 1.4 very often).


FYI I'm running ompi under Debian 6.0.4 (squeeze). 

thanks
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users