Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Rayne (lancer6238_at_[hidden])
Date: 2007-09-25 06:25:20


Hi all, I'm using the SGE system on my school network,
and would like to know if the errors I received below
means there's something wrong with my MPI_Recv
function.

[0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=104
[0,1,2][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=104

In my code, I have

/* executed by P1 to P(p-1) */
for (row = 1 ; row <= size[0] ; row++)
   MPI_Send(&(cell[row][1]), length, stable, 0, rank,
MPI_COMM_WORLD);

<some other computations>

/* P0 receive from P1 to P(p-2) */
for (source = 1 ; source < (p-1) ; source++)
   for (r = 1 ; r <= size[0] ; r++)
       MPI_Recv(&(cell[r][1])+(source-1)*mlength,
mlength, stable,source, source, MPI_COMM_WORLD,
&status);

/* P0 receive from P(p-1) */
for (r = 1 ; r <= size[0] ; r++)
   MPI_Recv(&(cell[r][1]) + (p-2)*mlength,
size[k]-(p-2)*mlength,stable, p-1, p-1,
MPI_COMM_WORLD, &status);

When I used some printf statements to see when the
errors occur, they usually occur in the middle of the
first MPI_Recv function, usually when source is 2 and
the value of r usually differs, i.e. the error does
not seem to occur at the same exact row:

Basically what I'm trying to do is:
Say there are a total of 4 processors (p=4), P0 - P3.
P1 and P2 each have a (size[0]+1)-by-(mlength+1)
matrix "cell", and P3 has a (size[0]+1)-by-(length+1)
matrix "cell". For P1 to P2, length = mlength.
size[k] = (p-2)*mlength + length(in P3)

I'm trying to send the matrix "cell" in P1, P2 and P3
to P0, then have P0 combine them into one
(size[0]+1)-by-(size[k]+1) matrix "cell". I'm sending
the matrix row-by-row.

In short, say the matrix in P1, P2 and P3 are
---- ---- -----
-### -ooo -@@@@
-### -ooo -@@@@
-### -ooo -@@@@

respectively. size[0] = 3, size[k] = 10, length = 3
for P1 and P2, length = 4 for P3 and mlength = 3.

I now need to combine them into 1 table in P0:
-----------
-###ooo@@@@
-###ooo@@@@
-###ooo@@@@

What is strange is I do this combination of matrices
more than once in my DNA sequence alignment program,
and the error occurs only when it tries to combine
matrices from one or two particular sequences, but not
the others.

Please help.

Thank you.

Regards,
Rayne

      ____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7