Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Pierre Valiron (Pierre.valiron_at_[hidden])
Date: 2006-03-14 04:42:59

I am now attempting to tune openmpi-1.1a1r9260 on Solaris Opteron.

Each quadripro node possess two ethernet interfaces bge0 and bge1.
Interfaces bge0 are dedicated to parallel jobs and correspond to node
names pxx,
they use a dedicated gigabit switch.
Interfaces bge1 provide nfs sharing etc and correspond to node names nxx
over another gigabit switch.

1) I allocated 4 quadripro nodes.
As documented in the FAQ, mpirun -np 4 -hostfile $OAR_FILE_NODES runs 4
tasks on the first SMP, and mpirun -np 4 -hostfile $OAR_FILE_NODES
--bynode distributes a task on each node.

2) According to the users list, mpirun --mca pml teg should revert to
2nd generation TCP instead of default ob1 (3rd gen). Unfortunately I get
the message
No available pml components were found!
Have you removed the 2nd generation TCP transport ? Do you consider the
new ob1 is competitive now ?

3) According to the users list, tuned collective primitives are
available. Apparently they are now compiled by default, but the don't
seem functional at all:

mpirun --mca coll tuned
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:0
*** End of error message ***

4) According to the FAQ and to the users list, openmpi attempts to
discover and use all interfaces. I attempted to force using bge0 only
with no success.

mpirun --mca btl_tcp_if_exclude bge1
[n33:04784] *** An error occurred in MPI_Barrier
[n33:04784] *** on communicator MPI_COMM_WORLD
[n33:04784] *** MPI_ERR_INTERN: internal error
[n33:04784] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 process killed (possibly by Open MPI)

In the FAQ it is stated that a new syntax should be available soon. I
tried if it is already implemented in openmpi-1.1a1r9260

mpirun --mca btl_tcp_if ^bge0,bge1
mpirun --mca btl_tcp_if ^bge1
works with identical performances.

However I doubt this option is functional, because if I disable all
ethernet interfaces,
mpirun --mca btl_tcp_if ^bge0,bge1
the job still works!

I would be happy to have more control on the interfaces being used.

What is expected to work on other platforms ?
What could be specific issues to the Solaris Opteron ?

Have a nice openmpi day!

Soutenez le mouvement SAUVONS LA RECHERCHE :
        _/_/_/_/    _/       _/       Dr. Pierre VALIRON
       _/     _/   _/      _/   Laboratoire d'Astrophysique
      _/     _/   _/     _/    Observatoire de Grenoble / UJF
     _/_/_/_/    _/    _/    BP 53  F-38041 Grenoble Cedex 9 (France)
    _/          _/   _/
   _/          _/  _/     Mail: Pierre.Valiron_at_[hidden]
  _/          _/ _/      Phone: +33 4 7651 4787  Fax: +33 4 7644 8821
 _/          _/_/