Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running simple MPI program
From: Brandon Fulcher (minguo_at_[hidden])
Date: 2010-10-23 12:58:44


Ah Jeff, maybe you are on to something.

So now that I understand what you mean, launching
mpirun -np 3 -hostfile hosts.txt hostname

returns two copies of the local system name and then the by now very
familiar

--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------

So I checked the OMPI package details on both machines, they each are
running Open MPI 1.3. . . but then I noticed that the packages are different
versions. Basically, the slave is running the previous Ubuntu release, and
the master is running the current one. Both have the most recent packages
for their release. . .but perhaps that is enough of a difference?

On Sat, Oct 23, 2010 at 11:43 AM, Jeff Squyres (jsquyres) <
jsquyres_at_[hidden]> wrote:

> What if you run w 2 hosts?
>
> It's unusual that no indication of the actual error is shown.
>
> Are you running exactly the same version of OMPI on both nodes?
>
>
> Sent from my PDA. No type good.
>
> On Oct 23, 2010, at 12:37 PM, "Brandon Fulcher" <minguo_at_[hidden]> wrote:
>
> Hi Jeff, thanks for responding.
>
> mpirun hostname returns the name of the local machine.
>
> On Sat, Oct 23, 2010 at 11:27 AM, Jeff Squyres (jsquyres) <<jsquyres_at_[hidden]>
> jsquyres_at_[hidden]> wrote:
>
>> I didn't notice if it came up earlier - are you running the same version
>> of OMPI on each node?
>>
>> What happens if you try mpirunning hostname (ie not an MPI app)?
>>
>> Sent from my PDA. No type good.
>>
>> On Oct 23, 2010, at 12:07 PM, "Brandon Fulcher" < <minguo_at_[hidden]>
>> minguo_at_[hidden]> wrote:
>>
>> Hi Jody, thank you for the response.
>>
>> Specifying the number of processes in the manner you provided
>> (mpirun -np 2 hostfile hosts.txt ilk)
>>
>> Does indeed succeed. All processes are launched on my local machine which
>> has two slots. If I change the command to:
>>
>> mpirun -np 3 hostfile hosts.txt ilk
>>
>> It however fails giving the same error.
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>>
>>
>> On Sat, Oct 23, 2010 at 10:13 AM, jody < <jody.xha_at_[hidden]><jody.xha_at_[hidden]>
>> jody.xha_at_[hidden]> wrote:
>>
>>> Hi Brandon
>>> Does it work if you try this:
>>> mpirun -np 2 hostfile hosts.txt ilk
>>>
>>> (see <http://www.open-mpi.org/faq/?category=running#simple-spmd-run><http://www.open-mpi.org/faq/?category=running#simple-spmd-run>
>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run)
>>>
>>> jody
>>>
>>> On Sat, Oct 23, 2010 at 4:07 PM, Brandon Fulcher < <minguo_at_[hidden]><minguo_at_[hidden]>
>>> minguo_at_[hidden]> wrote:
>>> > Thank you for the response!
>>> >
>>> > The code runs on my own machine as well. Both machines, in fact. And
>>> I did
>>> > not build MPI but installed the package from the ubuntu repositories.
>>> >
>>> > The problem occurs when I try to run a job using two machines or simply
>>> try
>>> > to run it on a slave from the master.
>>> >
>>> > the actual command I have run along with the output is below:
>>> >
>>> > mpirun -hostfile hosts.txt ilk
>>> >
>>> --------------------------------------------------------------------------
>>> > mpirun noticed that the job aborted, but has no info as to the process
>>> > that caused that situation.
>>> >
>>> --------------------------------------------------------------------------
>>> >
>>> > where hosts.txt contains:
>>> > 192.168.0.2 cpu=2
>>> > 192.168.0.6 cpu=1
>>> >
>>> >
>>> > If it matters the same output is given if I define a remote host in the
>>> > command such as (if I am on 192.168.0.2)
>>> > mpirun -host 192.168.0.6 ilk
>>> >
>>> > Now if I run it locally, the job succeeds. This works from either cpu.
>>> > mpirun ilk
>>> >
>>> >
>>> > Thanks in advance.
>>> >
>>> > On Fri, Oct 22, 2010 at 11:59 PM, David Zhang <<solarbikedz_at_[hidden]><solarbikedz_at_[hidden]>
>>> solarbikedz_at_[hidden]> wrote:
>>> >>
>>> >> since you said you're new to MPI, what command did you use to run the
>>> 2
>>> >> processes?
>>> >>
>>> >> On Fri, Oct 22, 2010 at 9:58 PM, David Zhang <<solarbikedz_at_[hidden]><solarbikedz_at_[hidden]>
>>> solarbikedz_at_[hidden]>
>>> >> wrote:
>>> >>>
>>> >>> your code works on mine machine. could be they way you build mpi.
>>> >>>
>>> >>> On Fri, Oct 22, 2010 at 7:26 PM, Brandon Fulcher <<minguo_at_[hidden]><minguo_at_[hidden]>
>>> minguo_at_[hidden]>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi, I am completely new to MPI and am having trouble running a job
>>> >>>> between two cpus.
>>> >>>>
>>> >>>> The same thing happens no matter what MPI job I try to run, but here
>>> is
>>> >>>> a simple 'hello world' style program I am trying to run.
>>> >>>>
>>> >>>> #include <mpi.h>
>>> >>>> #include <stdio.h>
>>> >>>>
>>> >>>> int main(int argc, char **argv)
>>> >>>> {
>>> >>>> int *buf, i, rank, nints, len;
>>> >>>> char hostname[256];
>>> >>>>
>>> >>>> MPI_Init(&argc,&argv);
>>> >>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>> >>>> gethostname(hostname,255);
>>> >>>> printf("Hello world! I am process number: %d on host %s\n", rank,
>>> >>>> hostname);
>>> >>>> MPI_Finalize();
>>> >>>> return 0;
>>> >>>> }
>>> >>>>
>>> >>>>
>>> >>>> On either CPU, I can successfully compile and run, but when trying
>>> to
>>> >>>> run the program using two CPUS it fails with this output:
>>> >>>>
>>> >>>>
>>> >>>>
>>> --------------------------------------------------------------------------
>>> >>>> mpirun noticed that the job aborted, but has no info as to the
>>> process
>>> >>>> that caused that situation.
>>> >>>>
>>> >>>>
>>> --------------------------------------------------------------------------
>>> >>>>
>>> >>>>
>>> >>>> With no additional information or errors, What can I do to go about
>>> >>>> finding out what is wrong?
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> I have read the FAQ and followed the instructions. I can ssh into
>>> the
>>> >>>> slave without entering a password and have the libraries installed
>>> on both
>>> >>>> machines.
>>> >>>>
>>> >>>> The only thing pertinent I could find is this faq
>>> >>>> <http://www.open-mpi.org/faq/?category=running#missing-prereqs><http://www.open-mpi.org/faq/?category=running#missing-prereqs>
>>> http://www.open-mpi.org/faq/?category=running#missing-prereqs but I do
>>> not
>>> >>>> know if it applies since I have installed open mpi from the Ubuntu
>>> >>>> repositories and assume the libraries are correctly set.
>>> >>>>
>>> >>>> _______________________________________________
>>> >>>> users mailing list
>>> >>>> <users_at_[hidden]> <users_at_[hidden]>users_at_[hidden]
>>> >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users><http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> David Zhang
>>> >>> University of California, San Diego
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> David Zhang
>>> >> University of California, San Diego
>>> >>
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> <users_at_[hidden]> <users_at_[hidden]>users_at_[hidden]
>>> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users><http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > <users_at_[hidden]> <users_at_[hidden]>users_at_[hidden]
>>> > <http://www.open-mpi.org/mailman/listinfo.cgi/users><http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >
>>>
>>> _______________________________________________
>>> users mailing list
>>> <users_at_[hidden]> <users_at_[hidden]>users_at_[hidden]
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users><http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> <users_at_[hidden]>users_at_[hidden]
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> <users_at_[hidden]>users_at_[hidden]
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>