Hello Ryan,

I have been running a similar heterogeneous setup in my lab; i.e., a mix of ppc64 and x86_64 systems connected by ethernet and InfiniBand.  In trying to replicate your problem, what I see is that it is not an issue of processor heterogeneity, but rather an issue with heterogeneous transports.  Can you remove the openib specifier from the btl lists in the appfile and try again?  I.e., force all inter-system communications over ethernet?  For me, that works.  But, if I mix systems with IB with systems without IB, I, too, see a hang...even if the processor architectures are the same.  If you could confirm that your case is the same, then we can make sure we're only chasing one problem and not two.

Thanks,
--Brad

Brad Benton
IBM




On Thu, May 1, 2008 at 11:02 AM, Ryan Buckley ; 21426 <rbuckley@mc.com> wrote:
Hello,

I am trying to run a simple Hello World MPI application in a
heterogeneous environment.  The machines include 1 x86 machine with a
standard 1Gb ethernet connection and 2 ppc machines with standard 1Gb
ethernet as well as a 10Gb ethernet (Infiniband) switch between the two.
The Hello World program is the same hello_c.c that is included in the
examples directory of the Open MPI installation.

The goal is that I would like to run heterogeneous applications between
the three aforementioned machines in the following manner:

       The x86 machine will use tcp to communicate to the 2 ppc machines,
while the ppc machines will communicate with one another via the 10GbE.

               x86 <--tcp--> ppc_1
               x86 <--tcp--> ppc_2
               ppc1 <--openib--> ppc_2

I am currently using a machfile set up as follows,

# cat machfile
<ppc_host_1>
<ppc_host_2>
<x86_host>

In addition I am using an appfile set up as follows,

# cat appfile
-np 1 --hostfile machfile --host <ppc_host_1> --mca btl
sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello
-np 1 --hostfile machfile --host <ppc_host_2> --mca btl
sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello
-np 1 --hostfile machfile --host <x86_host> --mca btl
sm,self,tcp /path/to/x86/openmpi-1.2.5/examples/hello

I am running on the command line via

# mpirun --app appfile

I've also attached the output from 'ompi_info --all' from all machines.

Any suggestions would be much appreciated.

Thanks,

Ryan


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users