Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2007-09-25 21:18:25


On Sep 25, 2007, at 4:25 AM, Rayne wrote:

> Hi all, I'm using the SGE system on my school network,
> and would like to know if the errors I received below
> means there's something wrong with my MPI_Recv
> function.
>
> [0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed with errno=104
> [0,1,2][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed with errno=104

Generally, these indicate that the remote process has died.
Generally, that means an abnormal termination due to segmentation
faults or the like. You might want to run the code under a debugger
to see if it shows anything useful. If your cluster doesn't have a
parallel debugger like TotalView or DDT available, you can (for small
numbers of processes) get away with using xterm and gdb, something like:

   mpirun -np X -d xterm -e gdb <application>

It'll open X xterms, each with a gdb running one instance of the
application.

Good luck,

Brian