Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] submitted job stops
From: Mark Kosmowski (mark.kosmowski_at_[hidden])
Date: 2008-04-09 12:09:56


Danesh:

Have you tried "mpirun -np 4 --hostfile hosts hostname" to verify that
ompi is working?

Can you remote access from each node to each other node?

If any node has more than 1 network device, are you using the ompi
options to specify which device to use?

Good luck,

Mark

> Message: 5
> Date: Wed, 9 Apr 2008 14:15:34 +0200 (CEST)
> From: "danesh.d_at_[hidden]" <danesh.d_at_[hidden]>
> Subject: [OMPI users] Ang: Re: submitted job stops
> To: <users_at_[hidden]>
> Message-ID:
> <24351656.56761207743334738.JavaMail.defaultUser_at_defaultHost>
> Content-Type: text/plain;charset="ISO-8859-15"
>
>
> Actually my program is very simple MPI program "Hello World" which
> just prints rank of each processor and then terminates. When I run
> my program on a single processor machine with e.g 4 processors
> (oversubscribing) it shows:
>
> Hello world from processor with rank 0
> Hello world from processor with rank 3
> Hello world from processor with rank 1
> Hello world from processor with rank 2
>
> but when I use my remote machines everything just stops when
> I run the program.
>
> No I do not use any queuing system. I simply run it like this:
>
> mpirun -np 4 --hostfile hosts ./hw
>
> and then it just tops until I terminate it manually. As I said,
> I monitored all machines (master+2 slaves) and found out that
> in all machines, "orted" daemon starts when I run the program, but
> after few seconds the daemon is terminated. What can be the reason?
>
> Thanks,
>
> Danesh
>
>
>
> >----Ursprungligt meddelande----
> >Fr?n: reuti_at_[hidden]
> >Datum: 09-04-2008 13:26
> >Till: "Open MPI Users"<users_at_[hidden]>
> >?rende: Re: [OMPI users] submitted job stops
> >
> >Hi,
> >
> >Am 08.04.2008 um 21:58 schrieb Danesh Daroui:
> >> I had posted a message about my problem and I did all solutions but
> >> the
> >> problem is not solved it. The problem is that
> >> I have installed Open-MPI on three machines (1 master+2 slaves).
> >> When I
> >> submit a job to master I can see that
> >> "orted" daemon is launched on all machines (by running "top" on all
> >> machines) but all "orted" daemons terminate after
> >> few seconds and nothing will happen. First I thought that it can be
> >> because remote machines can not launch "orted" but
> >> now I am sure that it can be run on all machines without problem. Any
> >> suggestion?
> >
> >the question is more: is your MPI program running successfully or is
> >there simply no output from mpiexec/-run? And: by "submit" you mean
> >you use any queuingsystem?
> >
> >-- Reuti
> >_______________________________________________
> >users mailing list
> >users_at_[hidden]
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 863, Issue 1
> *************************************
>