Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Galen Shipman (gshipman_at_[hidden])
Date: 2006-11-26 11:10:18


Oh, just noticed you are using GM, PML CM is only available for MX..
sorry..
Galen

On Nov 26, 2006, at 9:08 AM, Galen Shipman wrote:

> I would suggest trying Open MPI 1.2b1 and PML CM. You can select
> PML CM at runtime via:
>
> mpirun -mca pml cm
>
> Have you tried this?
>
> - Galen
>
>
>
> On Nov 21, 2006, at 12:28 PM, Scott Atchley wrote:
>
>> On Nov 21, 2006, at 1:27 PM, Brock Palen wrote:
>>
>>> I had sent a message two weeks ago about this problem and talked
>>> with
>>> jeff at SC06 about how it might not be a OMPI problem. But it
>>> appears now working with myricom that it is a problem in both
>>> lam-7.1.2 and openmpi-1.1.2/1.1.1. Basically the results from a
>>> HPL
>>> run are wrong, Also causes a large number of packets to be dropped
>>> by the fabric.
>>>
>>> This problem does not happen when using mpichgm. The number of
>>> dropped packets does not go up. There is a ticket open with myircom
>>> on this. They are a member of the group working on OMPI but i sent
>>> this out just to bring the list uptodate.
>>>
>>> If you have any questions feel free to ask me. The details are in
>>> the archive.
>>>
>>> Brock Palen
>>
>> Hi all,
>>
>> I am looking into this at Myricom.
>>
>> So far, I have compiled OMPI version 1.2b1 using the --with-gm=/path/
>> to/gm flag. I have compiled HPCC (contains HPL) using OMPI's mpicc.
>> Trying to run hpcc fails with "Myrinet/GM on host fog33 was unable to
>> find any NICs". See mpirun output below.
>>
>> I run gm_board_info and it finds two NICs.
>>
>> I run ompi_info and it has the gm btl (see ompi_info below).
>>
>> I have tried using the --prefix flag to mpirun as well as setting
>> PATH and LD_LIBRARY_PATH.
>>
>> What am I missing?
>>
>> Scott
>>
>>
>> % ompi_info -param btl gm
>> MCA btl: parameter "btl_base_debug" (current value:
>> "0")
>> If btl_base_debug is 1 standard debug is
>> output, if > 1 verbose debug
>> is output
>> MCA btl: parameter "btl" (current value: <none>)
>> Default selection set of components for
>> the btl framework (<none>
>> means "use all components that can be
>> found")
>> MCA btl: parameter "btl_base_verbose" (current
>> value: "0")
>> Verbosity level for the btl framework (0 =
>> no verbosity)
>> MCA btl: parameter "btl_gm_free_list_num" (current
>> value: "8")
>> MCA btl: parameter "btl_gm_free_list_max" (current
>> value: "-1")
>> MCA btl: parameter "btl_gm_free_list_inc" (current
>> value: "8")
>> MCA btl: parameter "btl_gm_debug" (current
>> value: "0")
>> MCA btl: parameter "btl_gm_mpool" (current value:
>> "gm")
>> MCA btl: parameter "btl_gm_max_ports" (current
>> value: "16")
>> MCA btl: parameter "btl_gm_max_boards" (current
>> value: "4")
>> MCA btl: parameter "btl_gm_max_modules" (current
>> value: "4")
>> MCA btl: parameter
>> "btl_gm_num_high_priority" (current value: "8")
>> MCA btl: parameter "btl_gm_num_repost" (current
>> value: "4")
>> MCA btl: parameter "btl_gm_port_name" (current
>> value: "OMPI")
>> MCA btl: parameter "btl_gm_exclusivity" (current
>> value: "1024")
>> MCA btl: parameter "btl_gm_eager_limit" (current
>> value: "32768")
>> MCA btl: parameter "btl_gm_min_send_size" (current
>> value: "32768")
>> MCA btl: parameter "btl_gm_max_send_size" (current
>> value: "65536")
>> MCA btl: parameter "btl_gm_min_rdma_size" (current
>> value: "524288")
>> MCA btl: parameter "btl_gm_max_rdma_size" (current
>> value: "131072")
>> MCA btl: parameter "btl_gm_flags" (current value:
>> "50")
>> MCA btl: parameter "btl_gm_bandwidth" (current
>> value: "250")
>> MCA btl: parameter "btl_gm_priority" (current
>> value: "0")
>> MCA btl: parameter
>> "btl_base_warn_component_unused" (current value: "1")
>> This parameter is used to turn on warning
>> messages when certain NICs
>> are not used
>>
>>
>>
>>
>>
>> % mpirun --prefix $OMPI -np 4 --host fog33,fog33,fog34,fog34 -mca btl
>> self,sm,gm ./hpcc
>> ---------------------------------------------------------------------
>> ---
>> --
>> [0,1,1]: Myrinet/GM on host fog33 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> [0,1,0]: Myrinet/GM on host fog33 was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> ---------------------------------------------------------------------
>> ---
>> --
>> ---------------------------------------------------------------------
>> ---
>> --
>> It looks like MPI_INIT failed for some reason; your parallel
>> process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's
>> some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> ---------------------------------------------------------------------
>> ---
>> --
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>>
>>
>> % ls -l $OMPI
>> total 1
>> drwx------ 2 atchley softies 496 Nov 21 13:01 bin
>> drwx------ 2 atchley softies 168 Nov 21 13:01 etc
>> drwx------ 3 atchley softies 184 Nov 21 13:01 include
>> drwx------ 3 atchley softies 896 Nov 21 13:01 lib
>> drwx------ 4 atchley softies 96 Nov 21 13:01 man
>> drwx------ 3 atchley softies 72 Nov 21 13:00 share
>>
>>
>> % ls -l $OMPI/bin
>> total 340
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpiCC ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpic++ ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicc ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpicxx ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpiexec -> orterun
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif77 ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 mpif90 ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 7 Nov 21 13:01 mpirun -> orterun
>> -rwxr-xr-x 1 atchley softies 138416 Nov 21 13:01 ompi_info
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalCC ->
>> opal_wrapper
>> -rwxr-xr-x 1 atchley softies 24119 Nov 21 13:00 opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalc++ ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:00 opalcc ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 orteCC ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortec++ ->
>> opal_wrapper
>> lrwxrwxrwx 1 atchley softies 12 Nov 21 13:01 ortecc ->
>> opal_wrapper
>> -rwxr-xr-x 1 atchley softies 26536 Nov 21 13:01 orted
>> -rwxr-xr-x 1 atchley softies 154770 Nov 21 13:01 orterun
>>
>> % ls -l $OMPI/lib
>> total 1741
>> -rwxr-xr-x 1 atchley softies 1045 Nov 21 13:01 libmca_common_sm.la
>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01 libmca_common_sm.so
>> -> libmca_common_sm.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 25 Nov 21 13:01
>> libmca_common_sm.so.
>> 0 -> libmca_common_sm.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 10074 Nov 21 13:01
>> libmca_common_sm.so.
>> 0.0.0
>> -rwxr-xr-x 1 atchley softies 1100 Nov 21 13:01 libmpi.la
>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so ->
>> libmpi.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 15 Nov 21 13:01 libmpi.so.0 ->
>> libmpi.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 640672 Nov 21 13:01 libmpi.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 1005 Nov 21 13:01 libmpi_cxx.la
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so ->
>> libmpi_cxx.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_cxx.so.0 ->
>> libmpi_cxx.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 142062 Nov 21 13:01 libmpi_cxx.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 1009 Nov 21 13:01 libmpi_f77.la
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so ->
>> libmpi_f77.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 19 Nov 21 13:01 libmpi_f77.so.0 ->
>> libmpi_f77.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 283394 Nov 21 13:01 libmpi_f77.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 996 Nov 21 13:00 libopal.la
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so ->
>> libopal.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 libopal.so.0 ->
>> libopal.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 285769 Nov 21 13:00 libopal.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 1051 Nov 21 13:00 liborte.la
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so ->
>> liborte.so.0.0.0
>> lrwxrwxrwx 1 atchley softies 16 Nov 21 13:00 liborte.so.0 ->
>> liborte.so.0.0.0
>> -rwxr-xr-x 1 atchley softies 380223 Nov 21 13:00 liborte.so.0.0.0
>> drwx------ 2 atchley softies 4160 Nov 21 13:01 openmpi
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>