Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2006-11-27 15:25:30


On Nov 27, 2006, at 10:56 AM, Galen Shipman wrote:

> Note that MX is supported as both a BTL and an MTL, I would recommend
> using the MX MTL as the performance is much better. If you are using
> GM you can only use OB1 or DR, I would recommend OB1 as DR is only
> available in the trunk and is in development.

In fact it depend on what you're looking for. If your algorithm is
latency bounded then using the MTL is the right choice. If what
you're looking at is bandwidth or if the MPI data-type used are not
contiguous then using the MX BTL will give you better performances.
Reading the 3 papers about the message layer in Open MPI might give
you a better understanding on how everything works inside.

   Thanks,
     george.

>
> To choose a specific PML at runtime use the MCA parameter facilities,
> for example:
>
>
> mpirun -np 2 -mca pml cm ./mpi-ping
>
>
>
>
>
> On Nov 27, 2006, at 7:48 AM, Brock Palen wrote:
>
>> Well, im not finding much good information on what 'pml' is. Or
>> what ones are available what one is used by default, or how to
>> switch between them. Is there a paper someplace that describes this?
>>
>> Brock Palen
>> Center for Advanced Computing
>> brockp_at_[hidden]
>> (734)936-1985
>>
>>
>> On Nov 26, 2006, at 11:10 AM, Galen Shipman wrote:
>>
>>> Oh, just noticed you are using GM, PML CM is only available for MX..
>>> sorry..
>>> Galen
>>>
>>>
>>>
>>> On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:
>>>
>>>> I would suggest trying Open MPI 1.2b1 and PML CM. You can select
>>>> PML CM at runtime via:
>>>>
>>>> mpirun -mca pml cm
>>>>
>>>> Have you tried this?
>>>>
>>>> - Galen
>>>>
>>>>
>>>>
>>>> On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:
>>>>
>>>>> On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:
>>>>>
>>>>>> I had sent a message two weeks ago about this problem and talked
>>>>>> with
>>>>>> jeff at SC06 about how it might not be a OMPI problem. But it
>>>>>> appears now working with myricom that it is a problem in both
>>>>>> lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the results from a
>>>>>> HPL
>>>>>> run are wrong, Also causes a large number of packets to be
>>>>>> dropped
>>>>>> by the fabric.
>>>>>>
>>>>>> This problem does not happen when using mpichgm. The number of
>>>>>> dropped packets does not go up. There is a ticket open with
>>>>>> myircom
>>>>>> on this. They are a member of the group working on OMPI but i
>>>>>> sent
>>>>>> this out just to bring the list uptodate.
>>>>>>
>>>>>> If you have any questions feel free to ask me. The details
>>>>>> are in
>>>>>> the archive.
>>>>>>
>>>>>> Brock Palen
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am looking into this at Myricom.
>>>>>
>>>>> So far, I have compiled OMPI version 1.2b1 using the --with-gm=/
>>>>> path/
>>>>> to/gm flag. I have compiled HPCC (contains HPL) using OMPI's
>>>>> mpicc.
>>>>> Trying to run hpcc fails with "Myrinet/GM on host fog33 was
>>>>> unable to
>>>>> find any NICs". See mpirun output below.
>>>>>
>>>>> I run gm_board_info and it finds two NICs.
>>>>>
>>>>> I run ompi_info and it has the gm btl (see ompi_info below).
>>>>>
>>>>> I have tried using the --prefix flag to mpirun as well as setting
>>>>> PATH and LD_LIBRARY_PATH.
>>>>>
>>>>> What am I missing?
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> % ompi_info -param btl gm
>>>>> MCA btl: parameter "btl_base_debug" (current
>>>>> value:
>>>>> "0")
>>>>> If btl_base_debug is 1 standard
>>>>> debug is
>>>>> output, if > 1 verbose debug
>>>>> is output
>>>>> MCA btl: parameter "btl" (current value: <none>)
>>>>> Default selection set of components for
>>>>> the btl framework (<none>
>>>>> means "use all components that can be
>>>>> found")
>>>>> MCA btl: parameter "btl_base_verbose" (current
>>>>> value: "0")
>>>>> Verbosity level for the btl framework
>>>>> (0 =
>>>>> no verbosity)
>>>>> MCA btl: parameter
>>>>> "btl_gm_free_list_num" (current
>>>>> value: "8")
>>>>> MCA btl: parameter
>>>>> "btl_gm_free_list_max" (current
>>>>> value: "-1")
>>>>> MCA btl: parameter
>>>>> "btl_gm_free_list_inc" (current
>>>>> value: "8")
>>>>> MCA btl: parameter "btl_gm_debug" (current
>>>>> value: "0")
>>>>> MCA btl: parameter "btl_gm_mpool" (current
>>>>> value:
>>>>> "gm")
>>>>> MCA btl: parameter "btl_gm_max_ports" (current
>>>>> value: "16")
>>>>> MCA btl: parameter "btl_gm_max_boards" (current
>>>>> value: "4")
>>>>> MCA btl: parameter "btl_gm_max_modules" (current
>>>>> value: "4")
>>>>> MCA btl: parameter
>>>>> "btl_gm_num_high_priority" (current value: "8")
>>>>> MCA btl: parameter "btl_gm_num_repost" (current
>>>>> value: "4")
>>>>> MCA btl: parameter "btl_gm_port_name" (current
>>>>> value: "OMPI")
>>>>> MCA btl: parameter "btl_gm_exclusivity" (current
>>>>> value: "1024")
>>>>> MCA btl: parameter "btl_gm_eager_limit" (current
>>>>> value: "32768")
>>>>> MCA btl: parameter
>>>>> "btl_gm_min_send_size" (current
>>>>> value: "32768")
>>>>> MCA btl: parameter
>>>>> "btl_gm_max_send_size" (current
>>>>> value: "65536")
>>>>> MCA btl: parameter
>>>>> "btl_gm_min_rdma_size" (current
>>>>> value: "524288")
>>>>> MCA btl: parameter
>>>>> "btl_gm_max_rdma_size" (current
>>>>> value: "131072")
>>>>> MCA btl: parameter "btl_gm_flags" (current
>>>>> value:
>>>>> "50")
>>>>> MCA btl: parameter "btl_gm_bandwidth" (current
>>>>> value: "250")
>>>>> MCA btl: parameter "btl_gm_priority" (current
>>>>> value: "0")
>>>>> MCA btl: parameter
>>>>> "btl_base_warn_component_unused" (current value: "1")
>>>>> This parameter is used to turn on
>>>>> warning
>>>>> messages when certain NICs
>>>>> are not used
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> % mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca
>>>>> btl
>>>>> self,sm,gm ./hpcc
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> [0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
>>>>> Another transport will be used instead, although this may
>>>>> result in
>>>>> lower performance.
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> [0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
>>>>> Another transport will be used instead, although this may
>>>>> result in
>>>>> lower performance.
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
>>>>> If you specified the use of a BTL component, you may have
>>>>> forgotten a component (such as "self") in the list of
>>>>> usable components.
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
>>>>> If you specified the use of a BTL component, you may have
>>>>> forgotten a component (such as "self") in the list of
>>>>> usable components.
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> It looks like MPI_INIT failed for some reason; your parallel
>>>>> process is
>>>>> likely to abort. There are many reasons that a parallel process
>>>>> can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems. This failure appears to be an internal failure; here's
>>>>> some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>>
>>>>> PML add procs failed
>>>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> ---
>>>>> --
>>>>> *** An error occurred in MPI_Init
>>>>> *** before MPI was initialized
>>>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>>>
>>>>>
>>>>>
>>>>> % ls -l $OMPI
>>>>> total 1
>>>>> drwx------ 2 atchley softies 496 Nov 21 13:01 bin
>>>>> drwx------ 2 atchley softies 168 Nov 21 13:01 etc
>>>>> drwx------ 3 atchley softies 184 Nov 21 13:01 include
>>>>> drwx------ 3 atchley softies 896 Nov 21 13:01 lib
>>>>> drwx------ 4 atchley softies 96 Nov 21 13:01 man
>>>>> drwx------ 3 atchley softies 72 Nov 21 13:00 share
>>>>>
>>>>>
>>>>> % ls -l $OMPI/bin
>>>>> total 340
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpiCC ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpic++ ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicc ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicxx ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpiexec ->
>>>>> orterun
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif77 ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif90 ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpirun ->
>>>>> orterun
>>>>> -rwxr-xr-x 1 atchley softies 138416 Nov 21 13:01 ompi_info
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalCC ->
>>>>> opal_wrapper
>>>>> -rwxr-xr-x 1 atchley softies 24119 Nov 21 13:00 opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalc++ ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalcc ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 orteCC ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortec++ ->
>>>>> opal_wrapper
>>>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortecc ->
>>>>> opal_wrapper
>>>>> -rwxr-xr-x 1 atchley softies 26536 Nov 21 13:01 orted
>>>>> -rwxr-xr-x 1 atchley softies 154770 Nov 21 13:01 orterun
>>>>>
>>>>> % ls -l $OMPI/lib
>>>>> total 1741
>>>>> -rwxr-xr-x 1 atchley softies 1045 Nov 21 13:01
>>>>> libmca_common_sm.la
>>>>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>>>>> libmca_common_sm.so
>>>>> -> libmca_common_sm.so.0.0.0
>>>>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>>>>> libmca_common_sm.so.
>>>>> 0 -> libmca_common_sm.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 10074 Nov 21 13:01
>>>>> libmca_common_sm.so.
>>>>> 0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 1100 Nov 21 13:01 libmpi.la
>>>>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so ->
>>>>> libmpi.so.0.0.0
>>>>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so.0 ->
>>>>> libmpi.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 1005 Nov 21 13:01 libmpi_cxx.la
>>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so ->
>>>>> libmpi_cxx.so.0.0.0
>>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so.
>>>>> 0 ->
>>>>> libmpi_cxx.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.
>>>>> 0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 1009 Nov 21 13:01 libmpi_f77.la
>>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so ->
>>>>> libmpi_f77.so.0.0.0
>>>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so.
>>>>> 0 ->
>>>>> libmpi_f77.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.
>>>>> 0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 996 Nov 21 13:00 libopal.la
>>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so ->
>>>>> libopal.so.0.0.0
>>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so.0 ->
>>>>> libopal.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 1051 Nov 21 13:00 liborte.la
>>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so ->
>>>>> liborte.so.0.0.0
>>>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so.0 ->
>>>>> liborte.so.0.0.0
>>>>> -rwxr-xr-x 1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
>>>>> drwx------ 2 atchley softies 4160 Nov 21 13:01 openmpi
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users