Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-01-23 10:30:53


I suspect the problem is that the rsh/ssh launcher is attempting to use a tree pattern for launching the apps - i.e., mpirun launches a daemon on the first couple of nodes, and then those daemons launch daemons on the next level. If rsh/ssh isn't supported on those backend nodes, then this won't work.

Try running it with "-mca plm_rsh_no_tree_spawn 1" on your cmd line. This will instruct OMPi to not use a tree pattern, but to have mpirun directly launch the daemons itself.

On Jan 23, 2013, at 5:41 AM, Ada Mancuso <mancuso.ada_at_[hidden]> wrote:

> Yes I can but with at most two machines as slave and one machine as master, If I try to add another one as slave I get those errors.
>
> Il giorno 23/gen/2013 14:38, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> ha scritto:
> I'm not sure I understand you. Does Open MPI work across multiple machines? I.e., can you do all three of those steps across multiple machines?
>
> On Jan 23, 2013, at 8:16 AM, Ada Mancuso <mancuso.ada_at_[hidden]>
> wrote:
>
> > I'm sure that openmpi works, morever my problem happens only with more than 2 slaves (on different machines while in local it greatly works with any number of slaves).
> > Thanks
> > Ada
> >
> > Il giorno 23/gen/2013 14:04, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> ha scritto:
> > Are you able to run the C examples in the examples/ directory from the tarball?
> >
> > Our README suggests the following:
> >
> > -----
> > When verifying a new Open MPI installation, we recommend running three
> > tests:
> >
> > 1. Use "mpirun" to launch a non-MPI program (e.g., hostname or uptime)
> > across multiple nodes.
> >
> > 2. Use "mpirun" to launch a trivial MPI program that does no MPI
> > communication (e.g., the hello_c program in the examples/ directory
> > in the Open MPI distribution).
> >
> > 3. Use "mpirun" to launch a trivial MPI program that sends and
> > receives a few MPI messages (e.g., the ring_c program in the
> > examples/ directory in the Open MPI distribution).
> >
> > If you can run all three of these tests successfully, that is a good
> > indication that Open MPI built and installed properly.
> > -----
> >
> >
> > On Jan 23, 2013, at 7:41 AM, Ada Mancuso <mancuso.ada_at_[hidden]>
> > wrote:
> >
> > > Hi,
> > > I've installed the latest snapshot taken from svn developer's trunk but I had the same problems. This is my configuration:
> > > • Ubuntu 2.6.38-8 kernel
> > > • Openssh_5.8p1 openssl 0.9.8o
> > > • Libtool version 2.4
> > > • Open mpi 1.7 rc5 and latest snapshots.
> > > Do you think my problem could be related with the operating system used or with any parameter or configuration? I've also checked the ssh log file but I didn't find any problem.
> > > Thanks in advance
> > > Ada
> > >
> > >
> > >
> > > Il giorno martedì 22 gennaio 2013, Ralph Castain ha scritto:
> > > >
> > > > Ouch - no, you'd have to take it from the developer's trunk, either via svn checkout or the nightly developer's snapshot
> > > >
> > > > On Jan 22, 2013, at 12:35 PM, Ada Mancuso <mancuso.ada_at_[hidden]> wrote:
> > > >
> > > > My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java binding mpijava... Is it present in the latest snapshot you told me? If so where can I find it?
> > > > Thanks a lot
> > > > Ada
> > > >
> > > > Il giorno 22/gen/2013 21:03, "Ralph Castain" <rhc_at_[hidden]> ha scritto:
> > > >>
> > > >> It seems to be working fine for me with the latest 1.7 tarball (not rc5 - I didn't test that one). Could be there was a problem that has since been fixed. We are getting ready to release an updated rc, so you might want to try it (or use the latest nightly 1.7 snapshot).
> > > >>
> > > >>
> > > >> On Jan 22, 2013, at 9:57 AM, Ada Mancuso <mancuso.ada_at_[hidden]> wrote:
> > > >>
> > > >> Hi,
> > > >> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using the command:
> > > >> mpirun -np4 -hostfile file a.out
> > > >> but i get the following message errors:
> > > >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c
> > > >> attempted to send to [[21341,0],2]: tag 15
> > > >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c
> > > >> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh keys among the machines and ssh login works without requiring authentication password. Surprisingly if I try to run my program with at most 2 hosts, and so the file hosts contains only two hosts, it works but if i try to run my program with more than two hosts i have this error; mpi works well on each machine and I also tried to run my program with different couple of machines in order to be sure that no machine could be the problem.
> > > >> Can you help me please?
> > > >> Ada
> > > >> _______________________________________________
> > > >> users mailing list
> > > >> users_at_[hidden]
> > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >>
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> users mailing list
> > > >> users_at_[hidden]
> > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users