Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Galen Shipman (gshipman_at_[hidden])
Date: 2006-11-26 11:10:18


Oh, just noticed you are using GM, PML CM is only available for MX..
sorry..
Galen

On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:

> I would suggest trying Open MPI 1.2b1 and PML CM. You can select
> PML CM at runtime via:
>
> mpirun -mca pml cm
>
> Have you tried this?
>
> - Galen
>
>
>
> On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:
>
>> On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:
>>
>>> I had sent a message two weeks ago about this problem and talked
>>> with
>>> jeff at SC06 about how it might not be a OMPI problem. But it
>>> appears now working with myricom that it is a problem in both
>>> lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the results from a
>>> HPL
>>> run are wrong, Also causes a large number of packets to be dropped
>>> by the fabric.
>>>
>>> This problem does not happen when using mpichgm. The number of
>>> dropped packets does not go up. There is a ticket open with myircom
>>> on this. They are a member of the group working on OMPI but i sent
>>> this out just to bring the list uptodate.
>>>
>>> If you have any questions feel free to ask me. The details are in
>>> the archive.
>>>
>>> Brock Palen
>>
>> Hi all,
>>
>> I am looking into this at Myricom.
>>
>> So far, I have compiled OMPI version 1.2b1 using the --with-gm=/path/
>> to/gm flag. I have compiled HPCC (contains HPL) using OMPI's mpicc.
>> Trying to run hpcc fails with "Myrinet/GM on host fog33 was unable to
>> find any NICs". See mpirun output below.
>>
>> I run gm_board_info and it finds two NICs.
>>
>> I run ompi_info and it has the gm btl (see ompi_info below).
>>
>> I have tried using the --prefix flag to mpirun as well as setting
>> PATH and LD_LIBRARY_PATH.
>>
>> What am I missing?
>>
>> Scott
>>
>>
>> % ompi_info -param btl gm
>> MCA btl: parameter "btl_base_debug" (current value:
>> "0")
>> If btl_base_debug is 1 standard debug is
>> output, if > 1 verbose debug
>> is output
>> MCA btl: parameter "btl" (current value: <none>)
>> Default selection set of components for
>> the btl framework (<none>
>> means "use all components that can be
>> found")
>> MCA btl: parameter "btl_base_verbose" (current
>> value: "0")
>> Verbosity level for the btl framework (0 =
>> no verbosity)
>> MCA btl: parameter "btl_gm_free_list_num" (current
>> value: "8")
>> MCA btl: parameter "btl_gm_free_list_max" (current
>> value: "-1")
>> MCA btl: parameter "btl_gm_free_list_inc" (current
>> value: "8")
>> MCA btl: parameter "btl_gm_debug" (current
>> value: "0")
>> MCA btl: parameter "btl_gm_mpool" (current value:
>> "gm")
>> MCA btl: parameter "btl_gm_max_ports" (current
>> value: "16")
>> MCA btl: parameter "btl_gm_max_boards" (current
>> value: "4")
>> MCA btl: parameter "btl_gm_max_modules" (current
>> value: "4")
>> MCA btl: parameter
>> "btl_gm_num_high_priority" (current value: "8")
>> MCA btl: parameter "btl_gm_num_repost" (current
>> value: "4")
>> MCA btl: parameter "btl_gm_port_name" (current
>> value: "OMPI")
>> MCA btl: parameter "btl_gm_exclusivity" (current
>> value: "1024")
>> MCA btl: parameter "btl_gm_eager_limit" (current
>> value: "32768")
>> MCA btl: parameter "btl_gm_min_send_size" (current
>> value: "32768")
>> MCA btl: parameter "btl_gm_max_send_size" (current
>> value: "65536")
>> MCA btl: parameter "btl_gm_min_rdma_size" (current
>> value: "524288")
>> MCA btl: parameter "btl_gm_max_rdma_size" (current
>> value: "131072")
>> MCA btl: parameter "btl_gm_flags" (current value:
>> "50")
>> MCA btl: parameter "btl_gm_bandwidth" (current
>> value: "250")
>> MCA btl: parameter "btl_gm_priority" (current
>> value: "0")
>> MCA btl: parameter
>> "btl_base_warn_component_unused" (current value: "1")
>> This parameter is used to turn on warning
>> messages when certain NICs
>> are not used
>>
>>
>>
>>
>>
>> % mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca btl
>> self,sm,gm ./hpcc
>> ---------------------------------------------------------------------
>> ---
>> --
>> [0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> [0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> It looks like MPI_INIT failed for some reason; your parallel
>> process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's
>> some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> ---------------------------------------------------------------------
>> ---
>> --
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>>
>>
>> % ls -l $OMPI
>> total 1
>> drwx------ 2 atchley softies 496 Nov 21 13:01 bin
>> drwx------ 2 atchley softies 168 Nov 21 13:01 etc
>> drwx------ 3 atchley softies 184 Nov 21 13:01 include
>> drwx------ 3 atchley softies 896 Nov 21 13:01 lib
>> drwx------ 4 atchley softies 96 Nov 21 13:01 man
>> drwx------ 3 atchley softies 72 Nov 21 13:00 share
>>
>>
>> % ls -l $OMPI/bin
>> total 340
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpiCC ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpic++ ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicc ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicxx ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpiexec -> orterun
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif77 ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif90 ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpirun -> orterun
>> -rwxr-xr-x 1 atchley softies 138416 Nov 21 13:01 ompi_info
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalCC ->
>> opal_wrapper
>> -rwxr-xr-x 1 atchley softies 24119 Nov 21 13:00 opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalc++ ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalcc ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 orteCC ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortec++ ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortecc ->
>> opal_wrapper
>> -rwxr-xr-x 1 atchley softies 26536 Nov 21 13:01 orted
>> -rwxr-xr-x 1 atchley softies 154770 Nov 21 13:01 orterun
>>
>> % ls -l $OMPI/lib
>> total 1741
>> -rwxr-xr-x 1 atchley softies 1045 Nov 21 13:01 libmca_common_sm.la
>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01 libmca_common_sm.so
>> -> libmca_common_sm.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>> libmca_common_sm.so.
>> 0 -> libmca_common_sm.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 10074 Nov 21 13:01
>> libmca_common_sm.so.
>> 0.0.0
>> -rwxr-xr-x 1 atchley softies 1100 Nov 21 13:01 libmpi.la
>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so ->
>> libmpi.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so.0 ->
>> libmpi.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 1005 Nov 21 13:01 libmpi_cxx.la
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so ->
>> libmpi_cxx.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so.0 ->
>> libmpi_cxx.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 1009 Nov 21 13:01 libmpi_f77.la
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so ->
>> libmpi_f77.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so.0 ->
>> libmpi_f77.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 996 Nov 21 13:00 libopal.la
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so ->
>> libopal.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so.0 ->
>> libopal.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 1051 Nov 21 13:00 liborte.la
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so ->
>> liborte.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so.0 ->
>> liborte.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
>> drwx------ 2 atchley softies 4160 Nov 21 13:01 openmpi
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>