On Jan 13, 2010, at 9:58 PM, SpiduS Okami wrote:
> I would like to know if someone could help me with the following error:
> [fenrir][[9567,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> I am trying to run the hpcc program in a beowulf type cluster with 2,3 and 4 machines. When I use 10.000 problems and up it gives me this error. Any one know what could be this? and how can I solve this problem.
This *usually* means that an MPI process has died unexpectedly; one of its peers noticed that it died by the fact that a socket closed.
You might want to poke around and see if there are corefiles or somesuch that explain why an MPI process died...?