Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can't start program across network
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-14 08:55:21


Can you send all the information here:

     http://www.open-mpi.org/community/help/

(including the network information)

Thanks!

On Mar 13, 2009, at 9:12 PM, Raymond Wan wrote:

>
> Hi Jeff,
>
>
> Jeff Squyres wrote:
> > On Mar 13, 2009, at 6:17 AM, Raymond Wan wrote:
> >
> >> What doesn't work is:
> >>
> >> [On Y] mpirun --host Y,Z --np 2 uname -a
> >> [On Y] mpirun --host X,Y,Z --np 3 uname -a
> >>
> >> ...and similarly for machine Z. I can confirm that from any of
> the 3
> >
> > Do you see "rsh" or "ssh" in the output of "ps -eadf" when mpirun is
> > hanging, perchance? If you, what happens if you copy-n-paste those
> > command lines and run them manually?
> >
>
>
> No, I don't see either rsh or ssh when mpirun is hanging. Is that
> odd? Something I'm doing wrong?
>
> I only see an mpirun command and an orted command.
>
>
> rwan 22800 22761 0 09:52 pts/2 00:00:00 mpirun --host X,Y,Z
> --np 3 sleep 1000
> rwan 22804 1 0 09:52 ? 00:00:00 orted --bootproxy 1
> --name 0.0.2 --num_procs 4 --vpid_start 0 --nodename Y --universe
> rwan_at_Y:default-universe-22800 --nsreplica "0.0.0;tcp://Y:36889" --
> gprreplica "0.0.0;tcp://Y:36889" --set-sid
>
>
> Actually, when I run the above mpirun command, I don't see "sleep"
> running locally on machine Y, either. However, if I did this:
>
> mpirun --host Y --np 3 sleep 1000
>
> I see 3 instances of "sleep" when I do ps -aedf. Does mpirun try to
> "ssh" all networked machines first before it starts the program
> (even if one of those instances will run locally?). Perhaps
> unrelated...but when I am on Y and I do an rsh to Z, I get a "No
> route to host". I asked the sysadmin about it (I'm not the sysadmin
> of Y or Z) and he doesn't know why but as we should be using ssh
> anyway, he isn't going to address the problem (unless it is a side-
> effect of my mpirun problem). I only presume rsh hasn't been set up
> properly; ssh works fine, though.
>
> Thank you!
>
> Ray
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems