OK, so please stop me if you have heard this before, but I couldnt find
anything in the archives that addressed my situation.
I have a Beowulf cluster where ALL the node are PS3s running Yellow Dog
Linux 6.2 and a host (server) that is a Dell i686 Quad-core running Fedora
Core 12. After a failed attempt at letting yum install openmpi, I
downloaded v1.4.1, compiled and installed on all machines (PS3s and Dell). I
have an NSF shared directory on the host where the application resides after
building. All nodes have access to the shared volume and they can see any
files in the shared volume.
I wrote a very simple master/slave application where the slave does a simple
computation and gets the processor name. The slave returns both pieces of
information to the master who then simply displays it in the terminal
window. After the slaves work on 1024 such tasks, the master exists.
When I run on the host, without distributing to the nodes, I use the
mpirun np 4 ./MPI_Example
Compiling and running the application on the native hardware works perfectly
(ie: compiled and run on the PS3 or compiled and run on the Dell).
However, when I went to scatter the tasks to the nodes, using the following
mpirun np 4 hostfile mpi-hostfile ./MPI_Example
the application fails. Im surmising that the issue is with running code
that was compiled for the Dell on the PS3 since the MPI_Init will launch the
application from the shared volume.
So, I took the source code and compiled it on both the Dell and the PS3 and
placed the executables in /shared_volume/Dell and /shared_volume/PS3 and
added the paths to the environment variable PATH. I tried to run the
application from the host again using the following command,
mpirun np 4 hostfile mpi-hostfile wdir
Hoping that the wdir would set the working directory at the time of the call
to MPI_Init() so that MPI_Init will launch the PS3 version of the
I get the error:
Could not execute the executable ./MPI_Example : Exec format error
This could mean that your PATH or executable name is wrong, or that you do
have the necessary permissions. Please ensure that the executable is able
found and executed.
Now, I know Im gonna get some heat for this, but all of these machine use
only the root account with full root privileges, so its not a permission
I am sure there is simple solution to my problem. Replacing the host with a
PS3 is not an option. Does anyone have any suggestions?
PS: When I get to programming the Cell BE, then Ill use the IBM Cell SDK
with its cross-compiler toolchain.