I am getting the following error, amidst many successful message sends:
[n10][[50048,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:118:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error (0x7f6155970038, 578659815)
Any ideas about what is going on or what I can do to fix it?
I am using the openmpi-bin 1.4.2-4 Debian package on a cluster
running Debian squeeze.
I couldn't find a config.log file; there is
/etc/openmpi/openmpi-mca-params.conf, which is completely commented
Invocation is from R 3.0.1 (debian package) with Rmpi 0.6.3 built by
me from source in a local directory. My sends all use mpi.isend.Robj
and the receives use mpi.recv.Robj, both from the Rmpi library.
The jobs were started with rmpilaunch; it and the hosts file are
included in the attachments. TCP connections. rmpilaunch leaves me
in an R session on the master. I invoked the code inside the
toplevel() function toward the bottom of dbox-master.R.
The program source files and other background information is in the
attached file. n10 has the output of
ompi_info --all, and
n1011 has other info for both nodes that were active (n10 was
master; n11 had some slaves).