Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Non-homogeneous Cluster Implementation
From: Lee Manko (lmanko_at_[hidden])
Date: 2010-02-02 13:23:27


Thanks, I'll give it a try!
Lee Manko

On Tue, Feb 2, 2010 at 10:01 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Probably the easiest solution is to tell OMPI not to use the second NIC.
> For example, if that NIC is eth1, then you could do this:
>
> mpirun -mca oob_tcp_if_exclude eth1 -mca btl_tcp_if_exclude eth1 ...
>
> This tells both the MPI layer and the RTE to ignore the eth1 interface.
>
>
>
>
> On Tue, Feb 2, 2010 at 10:04 AM, Lee Manko <lmanko_at_[hidden]> wrote:
>
>> Thank you Jody and Ralph. Your suggestions got me up and running (well
>> sort of). I have run into another issue that I was wondering if you have
>> had any experience with. My server has one NIC that is static and a second
>> that is DHCP on a corp network (the only way to get to the outside world).
>> My scatter/gather process does not work when the second NIC is plugged in,
>> but does work when unplugged. It appears to have something to do with DHCP
>> Discovery.
>>
>> Any suggestions?
>>
>> Lee Manko
>>
>>
>>
>> On Thu, Jan 28, 2010 at 11:53 AM, Lee Manko <lmanko_at_[hidden]> wrote:
>>
>>> See, it was a simple thing. Thank you for the information. I am trying
>>> it now. Have to recompile and re-install openmpi for a heterogeneous
>>> network.
>>>
>>> Now, knowing what to search for, I found that I can set the configuration
>>> of the cluster in a file that mpirun and mpiexec can read.
>>>
>>> mpirun --app my_appfile
>>>
>>>
>>> where app file contains the same --host information. Makes customizing
>>> the cluster for certain applications very easy.
>>>
>>> Thanks for the guidance to this MPI newbie.
>>>
>>> Lee
>>>
>>>
>>>
>>>
>>> On Wed, Jan 27, 2010 at 11:43 PM, jody <jody.xha_at_[hidden]> wrote:
>>>
>>>> Hi
>>>> I'm not sure i completely understood.
>>>> Is it the case that an application compiled on the dell will not work
>>>> on the PS3 and vice versa?
>>>>
>>>> If this is the case, you could try this:
>>>> shell$ mpirun -np 1 --host a app_ps3 : -np 1 --host b app_dell
>>>> where app_ps3 is your application compiled on the PS3 and a is your PS3
>>>> host,
>>>> and app_dell is your application compiled on the dell, and b is your
>>>> dell host.
>>>>
>>>> Check the MPI FAQs
>>>> http://www.open-mpi.org/faq/?category=running#mpmd-run
>>>> http://www.open-mpi.org/faq/?category=running#mpirun-host
>>>>
>>>> Hope this helps
>>>> Jody
>>>>
>>>> On Thu, Jan 28, 2010 at 3:08 AM, Lee Manko <lmanko_at_[hidden]> wrote:
>>>> > OK, so please stop me if you have heard this before, but I couldn’t
>>>> find
>>>> > anything in the archives that addressed my situation.
>>>> >
>>>> >
>>>> >
>>>> > I have a Beowulf cluster where ALL the node are PS3s running Yellow
>>>> Dog
>>>> > Linux 6.2 and a host (server) that is a Dell i686 Quad-core running
>>>> Fedora
>>>> > Core 12. After a failed attempt at letting yum install openmpi, I
>>>> > downloaded v1.4.1, compiled and installed on all machines (PS3s and
>>>> > Dell). I have an NSF shared directory on the host where the
>>>> application
>>>> > resides after building. All nodes have access to the shared volume
>>>> and they
>>>> > can see any files in the shared volume.
>>>> >
>>>> >
>>>> >
>>>> > I wrote a very simple master/slave application where the slave does a
>>>> simple
>>>> > computation and gets the processor name. The slave returns both
>>>> pieces of
>>>> > information to the master who then simply displays it in the terminal
>>>> > window. After the slaves work on 1024 such tasks, the master exists.
>>>> >
>>>> >
>>>> >
>>>> > When I run on the host, without distributing to the nodes, I use the
>>>> > command:
>>>> >
>>>> >
>>>> >
>>>> > “mpirun –np 4 ./MPI_Example”
>>>> >
>>>> >
>>>> >
>>>> > Compiling and running the application on the native hardware works
>>>> perfectly
>>>> > (ie: compiled and run on the PS3 or compiled and run on the Dell).
>>>> >
>>>> >
>>>> >
>>>> > However, when I went to scatter the tasks to the nodes, using the
>>>> following
>>>> > command,
>>>> >
>>>> >
>>>> >
>>>> > “mpirun –np 4 –hostfile mpi-hostfile ./MPI_Example”
>>>> >
>>>> >
>>>> >
>>>> > the application fails. I’m surmising that the issue is with running
>>>> code
>>>> > that was compiled for the Dell on the PS3 since the MPI_Init will
>>>> launch the
>>>> > application from the shared volume.
>>>> >
>>>> >
>>>> >
>>>> > So, I took the source code and compiled it on both the Dell and the
>>>> PS3 and
>>>> > placed the executables in /shared_volume/Dell and /shared_volume/PS3
>>>> and
>>>> > added the paths to the environment variable PATH. I tried to run the
>>>> > application from the host again using the following command,
>>>> >
>>>> >
>>>> >
>>>> > “mpirun –np 4 –hostfile mpi-hostfile –wdir
>>>> > /shared_volume/PS3 ./MPI_Example”
>>>> >
>>>> >
>>>> >
>>>> > Hoping that the wdir would set the working directory at the time of
>>>> the call
>>>> > to MPI_Init() so that MPI_Init will launch the PS3 version of the
>>>> > executable.
>>>> >
>>>> >
>>>> >
>>>> > I get the error:
>>>> >
>>>> > Could not execute the executable “./MPI_Example” : Exec format error
>>>> >
>>>> > This could mean that your PATH or executable name is wrong, or that
>>>> you do
>>>> > not
>>>> >
>>>> > have the necessary permissions. Please ensure that the executable is
>>>> able
>>>> > to be
>>>> >
>>>> > found and executed.
>>>> >
>>>> >
>>>> >
>>>> > Now, I know I’m gonna get some heat for this, but all of these machine
>>>> use
>>>> > only the root account with full root privileges, so it’s not a
>>>> permission
>>>> > issue.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > I am sure there is simple solution to my problem. Replacing the host
>>>> with a
>>>> > PS3 is not an option. Does anyone have any suggestions?
>>>> >
>>>> >
>>>> >
>>>> > Thanks.
>>>> >
>>>> >
>>>> >
>>>> > PS: When I get to programming the Cell BE, then I’ll use the IBM Cell
>>>> SDK
>>>> > with its cross-compiler toolchain.
>>>> >
>>>> > _______________________________________________
>>>> > users mailing list
>>>> > users_at_[hidden]
>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> >
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>