Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster
From: Reuti (reuti_at_[hidden])
Date: 2012-03-28 11:40:07


Am 28.03.2012 um 17:35 schrieb Hameed Alzahrani:

> Hi,
>
> Is there a specific name or location for the hostfile because I could not figure how to specify the number of processors for each machine in the command line.

No, just specify the name (or path) to it with:

--hostfile foobar

-- Reuti

> Regards,
>
> > From: reuti_at_[hidden]
> > Date: Wed, 28 Mar 2012 17:21:39 +0200
> > To: users_at_[hidden]
> > Subject: Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster
> >
> > Hi,
> >
> > Am 28.03.2012 um 16:55 schrieb Hameed Alzahrani:
> >
> > > I ran hello program which return the host name when I run it using
> > > mpirun -np 8 hello
> > > all the 8 answer returned from the same machine
> > > when I run it using
> > > mpirun -np 8 --host host1,host2,host3 hello
> > > I got answers from all the machines but it is not from all processors because I have 8 processors host1=4, host2=2, host3=2 the answer was 3 from host1, 3 from host2 and 2 from host3.
> >
> > If you want to specify the number of slots you can put it in a hostfile (otherwise a round robin assignment is just used). I'm not aware that it can be specified on the command line with different values for each machine:
> >
> > host1 slots=4
> > host2 slots=2
> > host3 slots=2
> >
> > -- Reuti
> >
> > >
> > > Regards,
> > >
> > > > From: reuti_at_[hidden]
> > > > Date: Wed, 28 Mar 2012 16:42:21 +0200
> > > > To: users_at_[hidden]
> > > > Subject: Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster
> > > >
> > > > Hi,
> > > >
> > > > Am 28.03.2012 um 16:30 schrieb Hameed Alzahrani:
> > > >
> > > > > Hi,
> > > > >
> > > > > I mean the node that I run mpirun command from. I use condor as a scheduler but I need to benchmark the cluster either from condor or directly from open MPI.
> > > >
> > > > I can't say anything regarding the Condor integration of Open MPI, but starting it directly by mpirun and supplying a valid number of ranks and hostfile should start some processes on other machines as requested. Can you run a plain mpihello first and output rank and hostname? Do you have ssh access to all the machines in questions? You have a shared home directory with the applications?
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > > when I ran mpirun from a machine and checking the memory status for the three machines that I have it appear that the memory usage increased just in the same machine.
> > > > >
> > > > > Regards,
> > > > >
> > > > > > From: reuti_at_[hidden]
> > > > > > Date: Wed, 28 Mar 2012 15:12:17 +0200
> > > > > > To: users_at_[hidden]
> > > > > > Subject: Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Am 27.03.2012 um 23:46 schrieb Hameed Alzahrani:
> > > > > >
> > > > > > > When I run any parallel job I get the answer just from the submitting node
> > > > > >
> > > > > > what do you mean by submitting node: you use a queuing system - which one?
> > > > > >
> > > > > > -- Reuti
> > > > > >
> > > > > >
> > > > > > > even when I tried to benchmark the cluster using LINPACK but it look like the job just working on the submitting node is there a way to make openMPI send the job equally to all the nodes depending on the number of processor in the current mode even if I specify that the job should use 8 processor it look like openMPI use the submitting node 4 processors instead of using the other processors. I tried also --host but it does not work correctly in benchmarking the cluster so does any one use openMPI in benchmarking a cluster or does any one knows how to make openMPI divids the parallel job equally to every processor on the cluster.
> > > > > > >
> > > > > > > Regards,
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > users_at_[hidden]
> > > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > users_at_[hidden]
> > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users_at_[hidden]
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users