Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mca_btl_tcp_frag_recv: readv failed: Connection resetby peer (104)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-01-14 17:52:07


On Jan 13, 2010, at 9:58 PM, SpiduS Okami wrote:

> I would like to know if someone could help me with the following error:
>
> [fenrir][[9567,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>
> I am trying to run the hpcc program in a beowulf type cluster with 2,3 and 4 machines. When I use 10.000 problems and up it gives me this error. Any one know what could be this? and how can I solve this problem.

This *usually* means that an MPI process has died unexpectedly; one of its peers noticed that it died by the fact that a socket closed.

You might want to poke around and see if there are corefiles or somesuch that explain why an MPI process died...?

-- 
Jeff Squyres
jsquyres_at_[hidden]