On Wed, 16 Jan 2013 07:46:41 -0800
Ralph Castain <rhc_at_[hidden]> wrote:
> This one means that a backend node lost its connection to mpirun. We use a TCP socket between the daemon on a node and mpirun to launch the processes and to detect if/when that node fails for some reason.
Hm. And what would be the reasons for this? Too much load on node where mpirun is run?