Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Ang: Re: submitted job stops
From: danesh.d_at_[hidden]
Date: 2008-04-09 08:15:34


Actually my program is very simple MPI program "Hello World" which
just prints rank of each processor and then terminates. When I run
my program on a single processor machine with e.g 4 processors
(oversubscribing) it shows:

Hello world from processor with rank 0
Hello world from processor with rank 3
Hello world from processor with rank 1
Hello world from processor with rank 2

but when I use my remote machines everything just stops when
I run the program.

No I do not use any queuing system. I simply run it like this:

mpirun -np 4 --hostfile hosts ./hw

and then it just tops until I terminate it manually. As I said,
I monitored all machines (master+2 slaves) and found out that
in all machines, "orted" daemon starts when I run the program, but
after few seconds the daemon is terminated. What can be the reason?

Thanks,

Danesh

>----Ursprungligt meddelande----
>Från: reuti_at_[hidden]
>Datum: 09-04-2008 13:26
>Till: "Open MPI Users"<users_at_[hidden]>
>Ärende: Re: [OMPI users] submitted job stops
>
>Hi,
>
>Am 08.04.2008 um 21:58 schrieb Danesh Daroui:
>> I had posted a message about my problem and I did all solutions but
>> the
>> problem is not solved it. The problem is that
>> I have installed Open-MPI on three machines (1 master+2 slaves).
>> When I
>> submit a job to master I can see that
>> "orted" daemon is launched on all machines (by running "top" on all
>> machines) but all "orted" daemons terminate after
>> few seconds and nothing will happen. First I thought that it can be
>> because remote machines can not launch "orted" but
>> now I am sure that it can be run on all machines without problem. Any
>> suggestion?
>
>the question is more: is your MPI program running successfully or is
>there simply no output from mpiexec/-run? And: by "submit" you mean
>you use any queuingsystem?
>
>-- Reuti
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>