From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-18 07:52:04


On Jul 17, 2007, at 10:54 AM, Shai Venter wrote:

> I know you guys are busy. Any attention to my questions will be
> mostly appreciated.

Glad to help; sorry I didn't get to this yesterday.

> Is there a FAQ for mtt? May be some of my Q have been answered before.

Most of the information on MTT is currently on the MTT wiki:

     https://svn.open-mpi.org/trac/mtt/wiki

We have a paper being published about MTT in the Euro PVM/MPI
conference in the beginning of October. The plan is to release MTT
as an open source package at the conference (to including having a
web site for it).

> I intend to run mtt and test some performance over Infiniband. My
> setup is a 2-Dell uni-core machines (i.e.: sw160,sw170) running
> SLES10.0. Each host has an Infiniband HCA Card installed. Each HCA
> Card has 2 Physical ports which are assigned unique IP’s (i.e.:
> 11.4.3.160,12.4.3.160 and 11.4.3.170,12.4.3.170 respectively)
>
> Ports are connected back-to-back (port1 <-> port1 and port2 <-> port2)

Ok.

> Q #1: Can I override INI file value for hostlist in command line?
> If yes, please provide example.

Yes. You can override any INI file value on the command line. For
example:

  ./mtt ... field=value

I *believe* that these field=value tokens must be last on the command
line, after all --flags, etc.

Check out the ./mtt --help message for a list of all the command line
options.
> Q #2: In your opinion, what should I specify to hostlist in order
> to run mpi jobs over my Infiniband fabric?
>
> Is it hostlist = sw160 sw170
>
> Is it hostlist = 11.4.3.160 12.4.3.160 11.4.3.170 12.4.3.170

Using either the names or IP addresses is fine; MTT doesn't really
differentiate between them. However, if you want to specify that you
want to start 2 MPI procs on each, then you need to either list the
name/IP twice or use the :num_procs notation, like this:

     hostlist = node1:2 node2:2 node3:2 node4:2

Did you look at the ompi-core-template.ini file in the samples
directory? It has a lot of comments in it explaining the fields, etc.

> Q #3: How do I determine which Interfaces mpi uses?

In the Cisco setup, I use MCA parameters to specifically force which
interfaces to use. For example, in my MPI Details section, I have
the following:

-----
[MPI Details: Open MPI]
exec = mpirun -np &test_np() @mca@ --mca btl_tcp_if_include ib0 --mca
oob_tcp_if_include ib0 --prefix &test_prefix() &test_executable()
&test_argv()

mca = &enumerate( \
         "--mca btl sm,tcp,self @common_params@", \
         "--mca btl tcp,self @common_params@", \
         "--mca btl sm,openib,self @common_params@", \
         "--mca btl openib,self @common_params@", \
         "--mca mpi_leave_pinned 1 --mca btl openib,self
@common_params@", \
         "--mca mpi_leave_pinned_pipeline 1 --mca btl openib,self
@common_params@", \
         "--mca btl_openib_use_eager_rdma 0 --mca btl openib,self
@common_params@", \
         "--mca btl_openib_use_srq 1 --mca btl openib,self
@common_params@", \
         "--mca mpi_leave_pinned 1 --mca btl sm,openib,self
@common_params@" )

common_params = --mca btl_openib_max_btls 1

----
So I'm using --mca btl ... to force specifically which OMPI BTLs to use.
In short -- MTT doesn't know/care what interfaces you use.  It just  
lets you set exactly which command lines you use to execute MPI jobs.
> Q #4: How can I determine max num of processes for my setup? In the  
> case of hostlist = sw160 sw170 , mtt will evaluate max_np to 2.
>
> In the case of hostlist = 11.4.3.160 12.4.3.160 11.4.3.170  
> 12.4.3.170 , max_np will result to 4.
Right.  Because in the first case, you listed each host once, so MTT  
assumes that you only want one process per host.  In the 2nd case,  
you list each host twice, so MTT assumes that you want two processes  
per host.
> Q #5: I what terms can I ask mtt to use a local scratch directory  
> on one of the host’ s hard drive  as oppose to some shared scratch  
> folder on Network file system.
We don't really have a good solution for this at the moment; the  
issue is that we need to have the target MPI installed and available  
on all the nodes that you want to run.  So we took the easy way out  
and just have a single scratch that spans all nodes (usually a  
network filesystem).  A better solution would be to have a local  
scratch and a global scratch; building the MPI could take place in  
the local scratch and the install could go to the global scratch.   
But we never got around to implementing that... patches would be  
welcome.  ;-)
-- 
Jeff Squyres
Cisco Systems