On Aug 4, 2005, at 6:43 AM, Jeff Squyres wrote:
>>
I got OpenMPI tar ball and could configure and build on AMD x86_64
>>
arch.
Excellent. Note, however, that it's probably better to get
a
Subversion checkout. As this is the current head of our
development
tree, it's a constantly moving target -- having a Subversion
checkout
will help you keep up with our progress.
>> In our
case, we need to enable MVAPI and disable OpenIB. For this, I
>> have
moved .ompi_ignore file from mvapi directory to openib directory.
>> I
could see that OpenIB was disabled as the entire openib tree was
>>
skipped by the autogen.sh script.
It depends on what version of the
tarball you got -- in the version
that I have, the mvapi components (both btl
and mpool) do not have
.ompi_ignore files (we recently removed them -- July
27th, r6613).
Additionally, you should not need to run autogen.sh in a
tarball (in
fact, autogen.sh should warn you if you try to do this).
autogen.sh is
only required in a Subversion checkout. Please see the
top-level
HACKING file in a Subversion checkout (I don't think that it
is
included in the tarball).
Finally, note that you'll need to give
additional --with options to
configure to tell it where the MVAPI libraries
and header files are
located -- more on this below.
>> While
running Pallas accross the nodes, I could see that data is
>> passing
over Gigbit ethernet and NOT over Infiniband. Does anyone has
>>
idea about why data is going through Gig and NOT over infiniband? Do
I
>> have to set any configuration options? Do I have to give any
run-time
>> options? I have tried with mpirun -mca btl mvapi but of no
use.
What is the output of the ompi_info command? This will tell
you if the
mvapi component is compiled and installed (it sounds like it is
not).
>> I could make out that TCP component is being used and in
order to
>> disable tcp, I have copied .ompi_ignore in to
directories
>> /ompi/orte/mca/oob/tcp and /ompi/ompi/mca/ptl/tcp. But
this time
>> program fails with segmentation fault error.
Right
now, IIRC, we don't have checks to ensure that there are valid
paths from one
MPI process to another -- which is probably the seg
fault.
Also note
that .ompi_ignore is an autogen mechanism. It is really
intended for
developers who want to protect parts of the tree during
development when it
is not ready for general use. It is not really
intended
>>
These are the configure options that I have given while configuring
>>
OpenMPI.
>>
>> ./configure --prefix=/openmpi
--with-btl-mvapi=/usr/local/topspin/
>>
--with-btl-mvapi-libdir=/usr/local/topspin --with-mvapi
Almost
correct. Check out ./configure --help:
--with-btl-mvapi=MVAPI_DIR
Additional directory to search for
MVAPI
installation
--with-btl-mvapi-libdir=IBLIBDIR
directory where the IB library can be found,
if
it
is not in MVAPI_DIR/lib or MVAPI_DIR/lib64
The --with-btl-mvapi-libdir
flag is only necessary if the MVAPI library
cannot be found the
/usr/local/topspin/lib or /usr/local/topspin/lib64.
There is no
--with-mvapi flag.
So it's quite possible that with the wrong value
for
--with-btl-mvapi-libdir, it failed to compile the mvapi
component
(i.e., I suspect it was looking for /usr/local/topspin/libmosal.*
when
libmosal is most likely in /usr/local/topspin/lib
or
/usr/local/topspin/lib64), which resulted in Open MPI falling back
to
TCP/GigE.
After you install Open MPI, you can run the ompi_info
command and it
will show a list of all the installed components. You
should see the
mvapi component in both the btl and mpool frameworks if all
went well.
If it didn't, then send us the output (stdout and stderr) of
configure,
the top-level config.log file, and the output from "make all"
(please
compress!) and we can have a look to see what went wrong.
Once
you have the mvapi components built, you can choose to use them at
run-time
via switches to mpirun. See the slides that we talked through
on the
teleconference -- I provided some examples (you can set these
via command
line arguments, environment variables, or files).
For one thing, you need
to manually specify to use the 3rd generation
p2p stuff in Open MPI -- our
2nd generation is still currently the
default (that will likely change in the
near future, but it hasn't been
done yet). For
example:
mpirun --mca pml ob1
--mca btl mvapi,self -np 4 a.out
This tells the pml to use the "ob1"
component (i.e., the 3rd generation
p2p stuff) and to use the mvapi and self
btl components (self is
loopback -- one processing sending to
itself).
Give that a whirl and let us know how it goes.
--
{+}
Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
_______________________________________________
devel
mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel