Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brock Palen (brockp_at_[hidden])
Date: 2006-11-27 09:48:41


Well, im not finding much good information on what 'pml' is. Or
what ones are available what one is used by default, or how to
switch between them. Is there a paper someplace that describes this?

Brock Palen
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Nov 26, 2006, at 11:10 AM, Galen Shipman wrote:

> Oh, just noticed you are using GM, PML CM is only available for MX..
> sorry..
> Galen
>
>
>
> On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:
>
>> I would suggest trying Open MPI 1.2b1 and PML CM. You can select
>> PML CM at runtime via:
>>
>> mpirun -mca pml cm
>>
>> Have you tried this?
>>
>> - Galen
>>
>>
>>
>> On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:
>>
>>> On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:
>>>
>>>> I had sent a message two weeks ago about this problem and talked
>>>> with
>>>> jeff at SC06 about how it might not be a OMPI problem. But it
>>>> appears now working with myricom that it is a problem in both
>>>> lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the results from a
>>>> HPL
>>>> run are wrong, Also causes a large number of packets to be dropped
>>>> by the fabric.
>>>>
>>>> This problem does not happen when using mpichgm. The number of
>>>> dropped packets does not go up. There is a ticket open with
>>>> myircom
>>>> on this. They are a member of the group working on OMPI but i sent
>>>> this out just to bring the list uptodate.
>>>>
>>>> If you have any questions feel free to ask me. The details are in
>>>> the archive.
>>>>
>>>> Brock Palen
>>>
>>> Hi all,
>>>
>>> I am looking into this at Myricom.
>>>
>>> So far, I have compiled OMPI version 1.2b1 using the --with-gm=/
>>> path/
>>> to/gm flag. I have compiled HPCC (contains HPL) using OMPI's mpicc.
>>> Trying to run hpcc fails with "Myrinet/GM on host fog33 was
>>> unable to
>>> find any NICs". See mpirun output below.
>>>
>>> I run gm_board_info and it finds two NICs.
>>>
>>> I run ompi_info and it has the gm btl (see ompi_info below).
>>>
>>> I have tried using the --prefix flag to mpirun as well as setting
>>> PATH and LD_LIBRARY_PATH.
>>>
>>> What am I missing?
>>>
>>> Scott
>>>
>>>
>>> % ompi_info -param btl gm
>>> MCA btl: parameter "btl_base_debug" (current
>>> value:
>>> "0")
>>> If btl_base_debug is 1 standard debug is
>>> output, if > 1 verbose debug
>>> is output
>>> MCA btl: parameter "btl" (current value: <none>)
>>> Default selection set of components for
>>> the btl framework (<none>
>>> means "use all components that can be
>>> found")
>>> MCA btl: parameter "btl_base_verbose" (current
>>> value: "0")
>>> Verbosity level for the btl framework
>>> (0 =
>>> no verbosity)
>>> MCA btl: parameter "btl_gm_free_list_num" (current
>>> value: "8")
>>> MCA btl: parameter "btl_gm_free_list_max" (current
>>> value: "-1")
>>> MCA btl: parameter "btl_gm_free_list_inc" (current
>>> value: "8")
>>> MCA btl: parameter "btl_gm_debug" (current
>>> value: "0")
>>> MCA btl: parameter "btl_gm_mpool" (current value:
>>> "gm")
>>> MCA btl: parameter "btl_gm_max_ports" (current
>>> value: "16")
>>> MCA btl: parameter "btl_gm_max_boards" (current
>>> value: "4")
>>> MCA btl: parameter "btl_gm_max_modules" (current
>>> value: "4")
>>> MCA btl: parameter
>>> "btl_gm_num_high_priority" (current value: "8")
>>> MCA btl: parameter "btl_gm_num_repost" (current
>>> value: "4")
>>> MCA btl: parameter "btl_gm_port_name" (current
>>> value: "OMPI")
>>> MCA btl: parameter "btl_gm_exclusivity" (current
>>> value: "1024")
>>> MCA btl: parameter "btl_gm_eager_limit" (current
>>> value: "32768")
>>> MCA btl: parameter "btl_gm_min_send_size" (current
>>> value: "32768")
>>> MCA btl: parameter "btl_gm_max_send_size" (current
>>> value: "65536")
>>> MCA btl: parameter "btl_gm_min_rdma_size" (current
>>> value: "524288")
>>> MCA btl: parameter "btl_gm_max_rdma_size" (current
>>> value: "131072")
>>> MCA btl: parameter "btl_gm_flags" (current value:
>>> "50")
>>> MCA btl: parameter "btl_gm_bandwidth" (current
>>> value: "250")
>>> MCA btl: parameter "btl_gm_priority" (current
>>> value: "0")
>>> MCA btl: parameter
>>> "btl_base_warn_component_unused" (current value: "1")
>>> This parameter is used to turn on warning
>>> messages when certain NICs
>>> are not used
>>>
>>>
>>>
>>>
>>>
>>> % mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca
>>> btl
>>> self,sm,gm ./hpcc
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> [0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> [0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
>>> If you specified the use of a BTL component, you may have
>>> forgotten a component (such as "self") in the list of
>>> usable components.
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
>>> If you specified the use of a BTL component, you may have
>>> forgotten a component (such as "self") in the list of
>>> usable components.
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel
>>> process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems. This failure appears to be an internal failure; here's
>>> some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>> PML add procs failed
>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>> --------------------------------------------------------------------
>>> -
>>> ---
>>> --
>>> *** An error occurred in MPI_Init
>>> *** before MPI was initialized
>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>
>>>
>>>
>>> % ls -l $OMPI
>>> total 1
>>> drwx------ 2 atchley softies 496 Nov 21 13:01 bin
>>> drwx------ 2 atchley softies 168 Nov 21 13:01 etc
>>> drwx------ 3 atchley softies 184 Nov 21 13:01 include
>>> drwx------ 3 atchley softies 896 Nov 21 13:01 lib
>>> drwx------ 4 atchley softies 96 Nov 21 13:01 man
>>> drwx------ 3 atchley softies 72 Nov 21 13:00 share
>>>
>>>
>>> % ls -l $OMPI/bin
>>> total 340
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpiCC ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpic++ ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicc ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicxx ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpiexec -> orterun
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif77 ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif90 ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpirun -> orterun
>>> -rwxr-xr-x 1 atchley softies 138416 Nov 21 13:01 ompi_info
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalCC ->
>>> opal_wrapper
>>> -rwxr-xr-x 1 atchley softies 24119 Nov 21 13:00 opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalc++ ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalcc ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 orteCC ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortec++ ->
>>> opal_wrapper
>>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortecc ->
>>> opal_wrapper
>>> -rwxr-xr-x 1 atchley softies 26536 Nov 21 13:01 orted
>>> -rwxr-xr-x 1 atchley softies 154770 Nov 21 13:01 orterun
>>>
>>> % ls -l $OMPI/lib
>>> total 1741
>>> -rwxr-xr-x 1 atchley softies 1045 Nov 21 13:01
>>> libmca_common_sm.la
>>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>>> libmca_common_sm.so
>>> -> libmca_common_sm.so.0.0.0
>>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>>> libmca_common_sm.so.
>>> 0 -> libmca_common_sm.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 10074 Nov 21 13:01
>>> libmca_common_sm.so.
>>> 0.0.0
>>> -rwxr-xr-x 1 atchley softies 1100 Nov 21 13:01 libmpi.la
>>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so ->
>>> libmpi.so.0.0.0
>>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so.0 ->
>>> libmpi.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 1005 Nov 21 13:01 libmpi_cxx.la
>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so ->
>>> libmpi_cxx.so.0.0.0
>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so.0 ->
>>> libmpi_cxx.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.
>>> 0.0.0
>>> -rwxr-xr-x 1 atchley softies 1009 Nov 21 13:01 libmpi_f77.la
>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so ->
>>> libmpi_f77.so.0.0.0
>>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so.0 ->
>>> libmpi_f77.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.
>>> 0.0.0
>>> -rwxr-xr-x 1 atchley softies 996 Nov 21 13:00 libopal.la
>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so ->
>>> libopal.so.0.0.0
>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so.0 ->
>>> libopal.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 1051 Nov 21 13:00 liborte.la
>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so ->
>>> liborte.so.0.0.0
>>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so.0 ->
>>> liborte.so.0.0.0
>>> -rwxr-xr-x 1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
>>> drwx------ 2 atchley softies 4160 Nov 21 13:01 openmpi
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>