Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Non-homogeneous Cluster Implementation
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-02-02 13:01:23


Probably the easiest solution is to tell OMPI not to use the second NIC. For
example, if that NIC is eth1, then you could do this:

mpirun -mca oob_tcp_if_exclude eth1 -mca btl_tcp_if_exclude eth1 ...

This tells both the MPI layer and the RTE to ignore the eth1 interface.

On Tue, Feb 2, 2010 at 10:04 AM, Lee Manko <lmanko_at_[hidden]> wrote:

> Thank you Jody and Ralph. Your suggestions got me up and running (well
> sort of). I have run into another issue that I was wondering if you have
> had any experience with. My server has one NIC that is static and a second
> that is DHCP on a corp network (the only way to get to the outside world).
> My scatter/gather process does not work when the second NIC is plugged in,
> but does work when unplugged. It appears to have something to do with DHCP
> Discovery.
>
> Any suggestions?
>
> Lee Manko
>
>
>
> On Thu, Jan 28, 2010 at 11:53 AM, Lee Manko <lmanko_at_[hidden]> wrote:
>
>> See, it was a simple thing. Thank you for the information. I am trying
>> it now. Have to recompile and re-install openmpi for a heterogeneous
>> network.
>>
>> Now, knowing what to search for, I found that I can set the configuration
>> of the cluster in a file that mpirun and mpiexec can read.
>>
>> mpirun --app my_appfile
>>
>>
>> where app file contains the same --host information. Makes customizing
>> the cluster for certain applications very easy.
>>
>> Thanks for the guidance to this MPI newbie.
>>
>> Lee
>>
>>
>>
>>
>> On Wed, Jan 27, 2010 at 11:43 PM, jody <jody.xha_at_[hidden]> wrote:
>>
>>> Hi
>>> I'm not sure i completely understood.
>>> Is it the case that an application compiled on the dell will not work
>>> on the PS3 and vice versa?
>>>
>>> If this is the case, you could try this:
>>> shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
>>> where app_ps3 is your application compiled on the PS3 and a is your PS3
>>> host,
>>> and app_dell is your application compiled on the dell, and b is your dell
>>> host.
>>>
>>> Check the MPI FAQs
>>> http://www.open-mpi.org/faq/?category=running#mpmd-run
>>> http://www.open-mpi.org/faq/?category=running#mpirun-host
>>>
>>> Hope this helps
>>> Jody
>>>
>>> On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko <lmanko_at_[hidden]> wrote:
>>> > OK, so please stop me if you have heard this before, but I couldn’t
>>> find
>>> > anything in the archives that addressed my situation.
>>> >
>>> >
>>> >
>>> > I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
>>> > Linux 6.2 and a host (server) that is a Dell i686 Quad-core running
>>> Fedora
>>> > Core 12. After a failed attempt at letting yum install openmpi, I
>>> > downloaded v1.4.1, compiled and installed on all machines (PS3s and
>>> > Dell). I have an NSF shared directory on the host where the
>>> application
>>> > resides after building. All nodes have access to the shared volume and
>>> they
>>> > can see any files in the shared volume.
>>> >
>>> >
>>> >
>>> > I wrote a very simple master/slave application where the slave does a
>>> simple
>>> > computation and gets the processor name. The slave returns both pieces
>>> of
>>> > information to the master who then simply displays it in the terminal
>>> > window. After the slaves work on 1024 such tasks, the master exists.
>>> >
>>> >
>>> >
>>> > When I run on the host, without distributing to the nodes, I use the
>>> > command:
>>> >
>>> >
>>> >
>>> > “mpirun –np 4 ./MPI_Example”
>>> >
>>> >
>>> >
>>> > Compiling and running the application on the native hardware works
>>> perfectly
>>> > (ie: compiled and run on the PS3 or compiled and run on the Dell).
>>> >
>>> >
>>> >
>>> > However, when I went to scatter the tasks to the nodes, using the
>>> following
>>> > command,
>>> >
>>> >
>>> >
>>> > “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
>>> >
>>> >
>>> >
>>> > the application fails. I’m surmising that the issue is with running
>>> code
>>> > that was compiled for the Dell on the PS3 since the MPI_Init will
>>> launch the
>>> > application from the shared volume.
>>> >
>>> >
>>> >
>>> > So, I took the source code and compiled it on both the Dell and the PS3
>>> and
>>> > placed the executables in /shared_volume/Dell and /shared_volume/PS3
>>> and
>>> > added the paths to the environment variable PATH. I tried to run the
>>> > application from the host again using the following command,
>>> >
>>> >
>>> >
>>> > “mpirun –np 4 –hostfile mpi-hostfile –wdir
>>> > /shared_volume/PS3 ./MPI_Example”
>>> >
>>> >
>>> >
>>> > Hoping that the wdir would set the working directory at the time of the
>>> call
>>> > to MPI_Init() so that MPI_Init will launch the PS3 version of the
>>> > executable.
>>> >
>>> >
>>> >
>>> > I get the error:
>>> >
>>> > Could not execute the executable “./MPI_Example” : Exec format error
>>> >
>>> > This could mean that your PATH or executable name is wrong, or that you
>>> do
>>> > not
>>> >
>>> > have the necessary permissions. Please ensure that the executable is
>>> able
>>> > to be
>>> >
>>> > found and executed.
>>> >
>>> >
>>> >
>>> > Now, I know I’m gonna get some heat for this, but all of these machine
>>> use
>>> > only the root account with full root privileges, so it’s not a
>>> permission
>>> > issue.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > I am sure there is simple solution to my problem. Replacing the host
>>> with a
>>> > PS3 is not an option. Does anyone have any suggestions?
>>> >
>>> >
>>> >
>>> > Thanks.
>>> >
>>> >
>>> >
>>> > PS: When I get to programming the Cell BE, then I’ll use the IBM Cell
>>> SDK
>>> > with its cross-compiler toolchain.
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>