Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Non-homogeneous Cluster Implementation
From: Lee Manko (lmanko_at_[hidden])
Date: 2010-01-28 14:53:14


See, it was a simple thing. Thank you for the information. I am trying it
now. Have to recompile and re-install openmpi for a heterogeneous network.

Now, knowing what to search for, I found that I can set the configuration of
the cluster in a file that mpirun and mpiexec can read.

mpirun --app my_appfile

where app file contains the same --host information. Makes customizing the
cluster for certain applications very easy.

Thanks for the guidance to this MPI newbie.

Lee

On Wed, Jan 27, 2010 at 11:43 PM, jody <jody.xha_at_[hidden]> wrote:

> Hi
> I'm not sure i completely understood.
> Is it the case that an application compiled on the dell will not work
> on the PS3 and vice versa?
>
> If this is the case, you could try this:
> shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
> where app_ps3 is your application compiled on the PS3 and a is your PS3
> host,
> and app_dell is your application compiled on the dell, and b is your dell
> host.
>
> Check the MPI FAQs
> http://www.open-mpi.org/faq/?category=running#mpmd-run
> http://www.open-mpi.org/faq/?category=running#mpirun-host
>
> Hope this helps
> Jody
>
> On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko <lmanko_at_[hidden]> wrote:
> > OK, so please stop me if you have heard this before, but I couldn’t find
> > anything in the archives that addressed my situation.
> >
> >
> >
> > I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
> > Linux 6.2 and a host (server) that is a Dell i686 Quad-core running
> Fedora
> > Core 12. After a failed attempt at letting yum install openmpi, I
> > downloaded v1.4.1, compiled and installed on all machines (PS3s and
> > Dell). I have an NSF shared directory on the host where the application
> > resides after building. All nodes have access to the shared volume and
> they
> > can see any files in the shared volume.
> >
> >
> >
> > I wrote a very simple master/slave application where the slave does a
> simple
> > computation and gets the processor name. The slave returns both pieces
> of
> > information to the master who then simply displays it in the terminal
> > window. After the slaves work on 1024 such tasks, the master exists.
> >
> >
> >
> > When I run on the host, without distributing to the nodes, I use the
> > command:
> >
> >
> >
> > “mpirun –np 4 ./MPI_Example”
> >
> >
> >
> > Compiling and running the application on the native hardware works
> perfectly
> > (ie: compiled and run on the PS3 or compiled and run on the Dell).
> >
> >
> >
> > However, when I went to scatter the tasks to the nodes, using the
> following
> > command,
> >
> >
> >
> > “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
> >
> >
> >
> > the application fails. I’m surmising that the issue is with running code
> > that was compiled for the Dell on the PS3 since the MPI_Init will launch
> the
> > application from the shared volume.
> >
> >
> >
> > So, I took the source code and compiled it on both the Dell and the PS3
> and
> > placed the executables in /shared_volume/Dell and /shared_volume/PS3 and
> > added the paths to the environment variable PATH. I tried to run the
> > application from the host again using the following command,
> >
> >
> >
> > “mpirun –np 4 –hostfile mpi-hostfile –wdir
> > /shared_volume/PS3 ./MPI_Example”
> >
> >
> >
> > Hoping that the wdir would set the working directory at the time of the
> call
> > to MPI_Init() so that MPI_Init will launch the PS3 version of the
> > executable.
> >
> >
> >
> > I get the error:
> >
> > Could not execute the executable “./MPI_Example” : Exec format error
> >
> > This could mean that your PATH or executable name is wrong, or that you
> do
> > not
> >
> > have the necessary permissions. Please ensure that the executable is
> able
> > to be
> >
> > found and executed.
> >
> >
> >
> > Now, I know I’m gonna get some heat for this, but all of these machine
> use
> > only the root account with full root privileges, so it’s not a permission
> > issue.
> >
> >
> >
> >
> >
> > I am sure there is simple solution to my problem. Replacing the host
> with a
> > PS3 is not an option. Does anyone have any suggestions?
> >
> >
> >
> > Thanks.
> >
> >
> >
> > PS: When I get to programming the Cell BE, then I’ll use the IBM Cell SDK
> > with its cross-compiler toolchain.
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>