Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-28 08:34:23


Somehow I missed this e-mail, sorry... Can you send all the
information listed on this web page:

     http://www.open-mpi.org/community/help/

On Jun 26, 2007, at 10:34 AM, Yuan Wan wrote:

>
> Hi all,
>
> I'm benchmarking our new cluster with HPL. I pick OpenMPI as parallel
> environment as I found OpenMPi is able to benefit from two giga-
> ethernet
> tcp
> networks on our cluster during low-level benchmark.
> (bandwidth could be upto 250MB/s)
>
> The HPL code is well built and run well for small problem size.
> However, when I turned to run the code on 32-node (128-way), the
> code will
> crash in the half way with the following error message:
>
> ---------------------------------------------
> [node074:09973] mca_btl_tcp_frag_send: writev failed with errno=104
> [node074:09973] mca_btl_tcp_frag_send: writev failed with errno=104
> [node073:10234] mca_btl_tcp_frag_send: writev failed with errno=104
> [node073:10234] mca_btl_tcp_frag_send: writev failed with errno=104
> [node089:29190] mca_btl_tcp_frag_send: writev failed with errno=104
> [node090:27881] mca_btl_tcp_frag_send: writev failed with errno=104
> [node072:02729] mca_btl_tcp_frag_send: writev failed with errno=104
> [node071:03029] mca_btl_tcp_frag_send: writev failed with errno=104
> .....
> [node084:06044] mca_btl_tcp_frag_send: writev failed with errno=104
> [node086:01346] mca_btl_tcp_frag_send: writev failed with errno=104
> [node069:16372] mca_btl_tcp_frag_send: writev failed with errno=104
> [node100:23294] mca_btl_tcp_frag_send: writev failed with errno=104
> [node069:16372] mca_btl_tcp_frag_send: writev failed with errno=104
> [node085:04347] mca_btl_tcp_frag_send: writev failed with errno=104
> [node087:31391] mca_btl_tcp_frag_send: writev failed with errno=104
> ---------------------------------------------
>
> According to the following faq instruction, I explicitly tell the
> interface name of tow tcp networks, but the code still break.
>
> mpirun --mca btl_tcp_if_include eth0,eth1 -np 128 -bynode -machinefile
> hostfile ./xhpl
>
> http://icl.cs.utk.edu/open-mpi/faq/?category=tcp#tcp-selection
>
> If I include only one tcp network, the code won't break, but the
> performance is not desirble/
>
>
> Anyone know how to fix it?
>
> --Yuan
>
>
> Yuan Wan
> ---
> Unix Section
> Information Services Infrastructure Division
> University of Edinburgh
>
> tel: 0131 650 4985
> email: ywan_at_[hidden]
>
> 2032 Computing Services, JCMB
> The King's Buildings,
> Edinburgh, EH9 3JZ
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems