Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running simple MPI program
From: Brandon Fulcher (minguo_at_[hidden])
Date: 2010-10-23 12:33:13


Hi Gustavo, thank you for the response.

I have been using Linux for only a couple years so I'm not very familiar
with ssh. However, i followed the instructions on this site:
https://source.ggy.bris.ac.uk/wiki/Configure_ssh_for_MPI
and I can ssh into the remote machine without a password prompt. (At least,
if I enter ssh 192.168.0.6 into the command line it does not ask me for one,
and I can then run commands on the remote cpu.)

as for etc/hosts, again not exactly familiar with it, but on both machines
the files are unchanged and have only an entry for localhost and the local
machine's name, and are otherwise identical.

127.0.0.1 localhost
127.0.1.1 <machine name>

I have installed open mpi packages on both machines.

running the command ( mpirun -np ${whatever} hostname)
returns my local machine's name.

I have read most of the faq that seemed pertinent, including running,
compiling, and troubleshooting sections. I am making no progress, and I
don't have much to go on sine the error message doesn't provide anything
useful.

On Sat, Oct 23, 2010 at 10:10 AM, Gustavo Correa <gus_at_[hidden]>wrote:

> Hi Brandon
>
> You must have passwordless ssh setup across the machines.
> Check if you can ssh passwordless back and forth across all node pairs,
> with the host names or IPs you have in your host.txt file.
>
> Your /etc/host (or whatever Ubuntu uses to match hosts and IPs) must be
> consistent (perhaps the same) across the machines.
>
> The same (Open)MPI must be installed on all machines,
> or installed on an NFS directory mounted on all machines.
>
> Make sure you use the same MPI to compile (mpicc) and to
> run (mpiexec/mpirun). It is quite common to inadvertently mixup
> different flavors/versions, which may come with Linux distributions,
> commercial compilers, etc, and sometimes take precedence on your
> $PATH. In doubt, use full path names for both mpicc and mpirun.
>
> It may be easier to run just "hostname" to check functionality:
>
> mpirun -np ${whatever} hostname
>
> If the Ubuntu package doesn't work ...
> It easy to build OpenMPI from source, and choose an installation
> directory that doesn't interfere with the system (e.g. under your home
> directory).
> The README file and the FAQ have clear instructions for that.
> It builds fine with gcc/g++/gfortran, if free compilers are your concern.
>
> The OpenMPI FAQ has good suggestions for initial troubleshooting:
> http://www.open-mpi.org/faq/
>
> My $0.02
> Gus Correa
>
> On Oct 23, 2010, at 10:07 AM, Brandon Fulcher wrote:
>
> > Thank you for the response!
> >
> > The code runs on my own machine as well. Both machines, in fact. And I
> did not build MPI but installed the package from the ubuntu repositories.
> >
> > The problem occurs when I try to run a job using two machines or simply
> try to run it on a slave from the master.
> >
> > the actual command I have run along with the output is below:
> >
> > mpirun -hostfile hosts.txt ilk
> >
> --------------------------------------------------------------------------
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> --------------------------------------------------------------------------
> >
> > where hosts.txt contains:
> > 192.168.0.2 cpu=2
> > 192.168.0.6 cpu=1
> >
> >
> > If it matters the same output is given if I define a remote host in the
> command such as (if I am on 192.168.0.2)
> > mpirun -host 192.168.0.6 ilk
> >
> > Now if I run it locally, the job succeeds. This works from either cpu.
> > mpirun ilk
> >
> >
> > Thanks in advance.
> >
> > On Fri, Oct 22, 2010 at 11:59 PM, David Zhang <solarbikedz_at_[hidden]>
> wrote:
> > since you said you're new to MPI, what command did you use to run the 2
> processes?
> >
> >
> > On Fri, Oct 22, 2010 at 9:58 PM, David Zhang <solarbikedz_at_[hidden]>
> wrote:
> > your code works on mine machine. could be they way you build mpi.
> >
> > On Fri, Oct 22, 2010 at 7:26 PM, Brandon Fulcher <minguo_at_[hidden]>
> wrote:
> > Hi, I am completely new to MPI and am having trouble running a job
> between two cpus.
> >
> > The same thing happens no matter what MPI job I try to run, but here is a
> simple 'hello world' style program I am trying to run.
> >
> > #include <mpi.h>
> > #include <stdio.h>
> >
> > int main(int argc, char **argv)
> > {
> > int *buf, i, rank, nints, len;
> > char hostname[256];
> >
> > MPI_Init(&argc,&argv);
> > MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> > gethostname(hostname,255);
> > printf("Hello world! I am process number: %d on host %s\n", rank,
> hostname);
> > MPI_Finalize();
> > return 0;
> > }
> >
> >
> > On either CPU, I can successfully compile and run, but when trying to run
> the program using two CPUS it fails with this output:
> >
> >
> --------------------------------------------------------------------------
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> --------------------------------------------------------------------------
> >
> >
> > With no additional information or errors, What can I do to go about
> finding out what is wrong?
> >
> >
> >
> > I have read the FAQ and followed the instructions. I can ssh into the
> slave without entering a password and have the libraries installed on both
> machines.
> >
> > The only thing pertinent I could find is this faq
> http://www.open-mpi.org/faq/?category=running#missing-prereqs but I do
> not know if it applies since I have installed open mpi from the Ubuntu
> repositories and assume the libraries are correctly set.
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > David Zhang
> > University of California, San Diego
> >
> >
> >
> > --
> > David Zhang
> > University of California, San Diego
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>