Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] writev error: Bad address
From: Ross Boylan (ross_at_[hidden])
Date: 2014-02-05 21:58:25


On 1/31/2014 1:08 PM, Ross Boylan wrote:
> I am getting the following error, amidst many successful message sends:
> [n10][[50048,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:118:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error (0x7f6155970038, 578659815)
> Bad address(1)
I think I've tracked down the immediate cause: I was sending a very
large object (from R--I assume serialized into a byte stream) that was
over 3G. I'm not sure why it would produce that error, but it doesn't
seem that surprising that something would go wrong.

Ross
> Any ideas about what is going on or what I can do to fix it?
>
> I am using the openmpi-bin 1.4.2-4 Debian package on a cluster running
> Debian squeeze.
>
> I couldn't find a config.log file; there is
> /etc/openmpi/openmpi-mca-params.conf, which is completely commented out.
>
> Invocation is from R 3.0.1 (debian package) with Rmpi 0.6.3 built by
> me from source in a local directory. My sends all use mpi.isend.Robj
> and the receives use mpi.recv.Robj, both from the Rmpi library.
>
> The jobs were started with rmpilaunch; it and the hosts file are
> included in the attachments. TCP connections. rmpilaunch leaves me in
> an R session on the master. I invoked the code inside the toplevel()
> function toward the bottom of dbox-master.R.
>
> The program source files and other background information is in the
> attached file. n10 has the output of |ompi_info --all, and n1011
> has other info for both nodes that were active (n10 was master; n11
> had some slaves).
> |
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users