Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Galen Shipman (gshipman_at_[hidden])
Date: 2006-11-27 10:56:33


Here is a paper on PML OB1:
http://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols

There is also some information in this paper:
http://www.open-mpi.org/papers/ipdps-2006

For a very detailed presentation on OB1 go here:
http://www.open-mpi.org/papers/workshop-2006/wed_01_pt2pt.pdf

In general we have 3 higher level Point-to-point Messaging Layers
(PMLs):

OB1 - Default high performance PML for networks that do not provide
higher level MPI semantics (read: don't provide matching)
DR - Network fault tolerant PML again for networks that do not
provide higher level MPI semantics
             Both OB1 and DR use the BTL (Byte Transfer Layer)
interface as described in the above papers
            Currently supported BTLs: GM, Mvapi, MX, OpenIB, SM, TCP, UDAPL

CM - High performance PML for networks that DO provide higher level
MPI semantics
            CM uses the MTL (Matching Transfer Layer) interface
            Currently supported MTLs: MX, PSM (InfiniPath), Portals

Note that MX is supported as both a BTL and an MTL, I would recommend
using the MX MTL as the performance is much better. If you are using
GM you can only use OB1 or DR, I would recommend OB1 as DR is only
available in the trunk and is in development.

To choose a specific PML at runtime use the MCA parameter facilities,
for example:

mpirun -np 2 -mca pml cm ./mpi-ping

On Nov 27, 2006, at 7:48 AM, Brock Palen wrote:

> Well, im not finding much good information on what 'pml' is. Or
> what ones are available what one is used by default, or how to
> switch between them. Is there a paper someplace that describes this?
>
> Brock Palen
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
> On Nov 26, 2006, at 11:10 AM, Galen Shipman wrote:
>
>> Oh, just noticed you are using GM, PML CM is only available for MX..
>> sorry..
>> Galen
>>
>>
>>
>> On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:
>>
>>> I would suggest trying Open MPI 1.2b1 and PML CM. You can select
>>> PML CM at runtime via:
>>>
>>> mpirun -mca pml cm
>>>
>>> Have you tried this?
>>>
>>> - Galen
>>>
>>>
>>>
>>> On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:
>>>
>>>> On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:
>>>>
>>>>> I had sent a message two weeks ago about this problem and talked
>>>>> with
>>>>> jeff at SC06 about how it might not be a OMPI problem. But it
>>>>> appears now working with myricom that it is a problem in both
>>>>> lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the results from a
>>>>> HPL
>>>>> run are wrong, Also causes a large number of packets to be
>>>>> dropped
>>>>> by the fabric.
>>>>>
>>>>> This problem does not happen when using mpichgm. The number of
>>>>> dropped packets does not go up. There is a ticket open with
>>>>> myircom
>>>>> on this. They are a member of the group working on OMPI but i
>>>>> sent
>>>>> this out just to bring the list uptodate.
>>>>>
>>>>> If you have any questions feel free to ask me. The details are in
>>>>> the archive.
>>>>>
>>>>> Brock Palen
>>>>
>>>> Hi all,
>>>>
>>>> I am looking into this at Myricom.
>>>>
>>>> So far, I have compiled OMPI version 1.2b1 using the --with-gm=/
>>>> path/
>>>> to/gm flag. I have compiled HPCC (contains HPL) using OMPI's mpicc.
>>>> Trying to run hpcc fails with "Myrinet/GM on host fog33 was
>>>> unable to
>>>> find any NICs". See mpirun output below.
>>>>
>>>> I run gm_board_info and it finds two NICs.
>>>>
>>>> I run ompi_info and it has the gm btl (see ompi_info below).
>>>>
>>>> I have tried using the --prefix flag to mpirun as well as setting
>>>> PATH and LD_LIBRARY_PATH.
>>>>
>>>> What am I missing?
>>>>
>>>> Scott
>>>>
>>>>
>>>> % ompi_info -param btl gm
>>>> MCA btl: parameter "btl_base_debug" (current
>>>> value:
>>>> "0")
>>>> If btl_base_debug is 1 standard debug is
>>>> output, if > 1 verbose debug
>>>> is output
>>>> MCA btl: parameter "btl" (current value: <none>)
>>>> Default selection set of components for
>>>> the btl framework (<none>
>>>> means "use all components that can be
>>>> found")
>>>> MCA btl: parameter "btl_base_verbose" (current
>>>> value: "0")
>>>> Verbosity level for the btl framework
>>>> (0 =
>>>> no verbosity)
>>>> MCA btl: parameter
>>>> "btl_gm_free_list_num" (current
>>>> value: "8")
>>>> MCA btl: parameter
>>>> "btl_gm_free_list_max" (current
>>>> value: "-1")
>>>> MCA btl: parameter
>>>> "btl_gm_free_list_inc" (current
>>>> value: "8")
>>>> MCA btl: parameter "btl_gm_debug" (current
>>>> value: "0")
>>>> MCA btl: parameter "btl_gm_mpool" (current value:
>>>> "gm")
>>>> MCA btl: parameter "btl_gm_max_ports" (current
>>>> value: "16")
>>>> MCA btl: parameter "btl_gm_max_boards" (current
>>>> value: "4")
>>>> MCA btl: parameter "btl_gm_max_modules" (current
>>>> value: "4")
>>>> MCA btl: parameter
>>>> "btl_gm_num_high_priority" (current value: "8")
>>>> MCA btl: parameter "btl_gm_num_repost" (current
>>>> value: "4")
>>>> MCA btl: parameter "btl_gm_port_name" (current
>>>> value: "OMPI")
>>>> MCA btl: parameter "btl_gm_exclusivity" (current
>>>> value: "1024")
>>>> MCA btl: parameter "btl_gm_eager_limit" (current
>>>> value: "32768")
>>>> MCA btl: parameter
>>>> "btl_gm_min_send_size" (current
>>>> value: "32768")
>>>> MCA btl: parameter
>>>> "btl_gm_max_send_size" (current
>>>> value: "65536")
>>>> MCA btl: parameter
>>>> "btl_gm_min_rdma_size" (current
>>>> value: "524288")
>>>> MCA btl: parameter
>>>> "btl_gm_max_rdma_size" (current
>>>> value: "131072")
>>>> MCA btl: parameter "btl_gm_flags" (current value:
>>>> "50")
>>>> MCA btl: parameter "btl_gm_bandwidth" (current
>>>> value: "250")
>>>> MCA btl: parameter "btl_gm_priority" (current
>>>> value: "0")
>>>> MCA btl: parameter
>>>> "btl_base_warn_component_unused" (current value: "1")
>>>> This parameter is used to turn on
>>>> warning
>>>> messages when certain NICs
>>>> are not used
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> % mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca
>>>> btl
>>>> self,sm,gm ./hpcc
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> [0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
>>>> Another transport will be used instead, although this may result in
>>>> lower performance.
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> [0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
>>>> Another transport will be used instead, although this may result in
>>>> lower performance.
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
>>>> If you specified the use of a BTL component, you may have
>>>> forgotten a component (such as "self") in the list of
>>>> usable components.
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
>>>> If you specified the use of a BTL component, you may have
>>>> forgotten a component (such as "self") in the list of
>>>> usable components.
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> It looks like MPI_INIT failed for some reason; your parallel
>>>> process is
>>>> likely to abort. There are many reasons that a parallel process
>>>> can
>>>> fail during MPI_INIT; some of which are due to configuration or
>>>> environment
>>>> problems. This failure appears to be an internal failure; here's
>>>> some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>> PML add procs failed
>>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> ---
>>>> --
>>>> *** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>>
>>>>
>>>>
>>>> % ls -l $OMPI
>>>> total 1
>>>> drwx------ 2 atchley softies 496 Nov 21 13:01 bin
>>>> drwx------ 2 atchley softies 168 Nov 21 13:01 etc
>>>> drwx------ 3 atchley softies 184 Nov 21 13:01 include
>>>> drwx------ 3 atchley softies 896 Nov 21 13:01 lib
>>>> drwx------ 4 atchley softies 96 Nov 21 13:01 man
>>>> drwx------ 3 atchley softies 72 Nov 21 13:00 share
>>>>
>>>>
>>>> % ls -l $OMPI/bin
>>>> total 340
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpiCC ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpic++ ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicc ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicxx ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpiexec ->
>>>> orterun
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif77 ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif90 ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpirun -> orterun
>>>> -rwxr-xr-x 1 atchley softies 138416 Nov 21 13:01 ompi_info
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalCC ->
>>>> opal_wrapper
>>>> -rwxr-xr-x 1 atchley softies 24119 Nov 21 13:00 opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalc++ ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalcc ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 orteCC ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortec++ ->
>>>> opal_wrapper
>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortecc ->
>>>> opal_wrapper
>>>> -rwxr-xr-x 1 atchley softies 26536 Nov 21 13:01 orted
>>>> -rwxr-xr-x 1 atchley softies 154770 Nov 21 13:01 orterun
>>>>
>>>> % ls -l $OMPI/lib
>>>> total 1741
>>>> -rwxr-xr-x 1 atchley softies 1045 Nov 21 13:01
>>>> libmca_common_sm.la
>>>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>>>> libmca_common_sm.so
>>>> -> libmca_common_sm.so.0.0.0
>>>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>>>> libmca_common_sm.so.
>>>> 0 -> libmca_common_sm.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 10074 Nov 21 13:01
>>>> libmca_common_sm.so.
>>>> 0.0.0
>>>> -rwxr-xr-x 1 atchley softies 1100 Nov 21 13:01 libmpi.la
>>>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so ->
>>>> libmpi.so.0.0.0
>>>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so.0 ->
>>>> libmpi.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 1005 Nov 21 13:01 libmpi_cxx.la
>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so ->
>>>> libmpi_cxx.so.0.0.0
>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so.
>>>> 0 ->
>>>> libmpi_cxx.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.
>>>> 0.0.0
>>>> -rwxr-xr-x 1 atchley softies 1009 Nov 21 13:01 libmpi_f77.la
>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so ->
>>>> libmpi_f77.so.0.0.0
>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so.
>>>> 0 ->
>>>> libmpi_f77.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.
>>>> 0.0.0
>>>> -rwxr-xr-x 1 atchley softies 996 Nov 21 13:00 libopal.la
>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so ->
>>>> libopal.so.0.0.0
>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so.0 ->
>>>> libopal.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 1051 Nov 21 13:00 liborte.la
>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so ->
>>>> liborte.so.0.0.0
>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so.0 ->
>>>> liborte.so.0.0.0
>>>> -rwxr-xr-x 1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
>>>> drwx------ 2 atchley softies 4160 Nov 21 13:01 openmpi
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users