Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-01-23 08:37:36


I'm not sure I understand you. Does Open MPI work across multiple machines? I.e., can you do all three of those steps across multiple machines?

On Jan 23, 2013, at 8:16 AM, Ada Mancuso <mancuso.ada_at_[hidden]>
 wrote:

> I'm sure that openmpi works, morever my problem happens only with more than 2 slaves (on different machines while in local it greatly works with any number of slaves).
> Thanks
> Ada
>
> Il giorno 23/gen/2013 14:04, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> ha scritto:
> Are you able to run the C examples in the examples/ directory from the tarball?
>
> Our README suggests the following:
>
> -----
> When verifying a new Open MPI installation, we recommend running three
> tests:
>
> 1. Use "mpirun" to launch a non-MPI program (e.g., hostname or uptime)
> across multiple nodes.
>
> 2. Use "mpirun" to launch a trivial MPI program that does no MPI
> communication (e.g., the hello_c program in the examples/ directory
> in the Open MPI distribution).
>
> 3. Use "mpirun" to launch a trivial MPI program that sends and
> receives a few MPI messages (e.g., the ring_c program in the
> examples/ directory in the Open MPI distribution).
>
> If you can run all three of these tests successfully, that is a good
> indication that Open MPI built and installed properly.
> -----
>
>
> On Jan 23, 2013, at 7:41 AM, Ada Mancuso <mancuso.ada_at_[hidden]>
> wrote:
>
> > Hi,
> > I've installed the latest snapshot taken from svn developer's trunk but I had the same problems. This is my configuration:
> > • Ubuntu 2.6.38-8 kernel
> > • Openssh_5.8p1 openssl 0.9.8o
> > • Libtool version 2.4
> > • Open mpi 1.7 rc5 and latest snapshots.
> > Do you think my problem could be related with the operating system used or with any parameter or configuration? I've also checked the ssh log file but I didn't find any problem.
> > Thanks in advance
> > Ada
> >
> >
> >
> > Il giorno martedì 22 gennaio 2013, Ralph Castain ha scritto:
> > >
> > > Ouch - no, you'd have to take it from the developer's trunk, either via svn checkout or the nightly developer's snapshot
> > >
> > > On Jan 22, 2013, at 12:35 PM, Ada Mancuso <mancuso.ada_at_[hidden]> wrote:
> > >
> > > My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java binding mpijava... Is it present in the latest snapshot you told me? If so where can I find it?
> > > Thanks a lot
> > > Ada
> > >
> > > Il giorno 22/gen/2013 21:03, "Ralph Castain" <rhc_at_[hidden]> ha scritto:
> > >>
> > >> It seems to be working fine for me with the latest 1.7 tarball (not rc5 - I didn't test that one). Could be there was a problem that has since been fixed. We are getting ready to release an updated rc, so you might want to try it (or use the latest nightly 1.7 snapshot).
> > >>
> > >>
> > >> On Jan 22, 2013, at 9:57 AM, Ada Mancuso <mancuso.ada_at_[hidden]> wrote:
> > >>
> > >> Hi,
> > >> I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using the command:
> > >> mpirun -np4 -hostfile file a.out
> > >> but i get the following message errors:
> > >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c
> > >> attempted to send to [[21341,0],2]: tag 15
> > >> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c
> > >> The file etc/hosts is composed by ipaddress hostname, I have exchange ssh keys among the machines and ssh login works without requiring authentication password. Surprisingly if I try to run my program with at most 2 hosts, and so the file hosts contains only two hosts, it works but if i try to run my program with more than two hosts i have this error; mpi works well on each machine and I also tried to run my program with different couple of machines in order to be sure that no machine could be the problem.
> > >> Can you help me please?
> > >> Ada
> > >> _______________________________________________
> > >> users mailing list
> > >> users_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> users_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/