Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-01-16 11:10:01


If you login to eprueba and try to ping pruebaborja, can you do it? What network is it using?

Sometimes the problem is that you have multiple ethernet interfaces on the machines and we pick the wrong one - i.e., one that cannot connect to the other machine. There are ways to help resolve the problem if that's the case, but first check to see.

Also, if you configure OMPI --enable-debug, there are diagnostics you can enable that will help debug the problem.

On Jan 16, 2013, at 7:59 AM, borja mf <borjamunozfernandez_at_[hidden]> wrote:

> Getting the same error...
> I forgot to say that I must to use Ubuntu and Im compiling with mpicc. My code is written on C.
>
> Thank for answer.
>
> Im going crazy with this problem. There's not much info about.
>
> 2013/1/16 Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
> Try disabling firewalling between your nodes. The easiest way is "sudo service iptables stop".
>
>
> On Jan 16, 2013, at 7:46 AM, borja mf <borjamunozfernandez_at_[hidden]>
> wrote:
>
> > Hello all.
> > I want to learn MPI and I've trying to setting up OMPI for first time on three nodes. My config above:
> > Ubuntu server - master node: pruebaborja
> > 2x Ubuntu Desktop - slaves node:
> > clienteprueba
> > clientepruebados 4 slots
> >
> > Im running NFSv4 for sharing /home/mpiuser.
> > I want to test a plain "Hello world"but I can't make it working successfully on node clienteprueba. This is the problem:
> >
> > mpiuser_at_pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo
> > [clienteprueba:01993] [[64434,0], 2] -> [[64434,0],0] mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor (9) [sd = 9]
> > [clienteprueba:01993] [[64434,0], 2] routed:binomial: Connection to lifeline [[64434,0],0] lost
> >
> > However, with clientepruebados and pruebaborja only on my hostfile, it works:
> >
> > pruebaborja slots=1
> > clientepruebados slots=4
> > #clienteprueba slots=1
> >
> > mpiuser_at_pruebaborja:~$ mpirun -np 6 --hostfile .mpi_hostfile ./holamundo
> > Hola, mundo, soy pruebaborja: 0 de 6
> > Hola, mundo, soy pruebaborja: 5 de 6
> > Hola, mundo, soy clientepruebados: 1 de 6
> > Hola, mundo, soy clientepruebados: 2 de 6
> > Hola, mundo, soy clientepruebados: 3 de 6
> > Hola, mundo, soy clientepruebados: 4 de 6
> >
> > I've checked the OMPI versions on the machines and it's the same. I can't understand why Im getting this error on clienteprueba; i've done the same config on clientepruebados and clienteprueba. Anyone could help me to solve this?
> >
> > Sorry for my english.
> > Thanks in advance
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users