Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] SM component init unload
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-07-03 15:44:57


Interesting - yes, coll sm doesn't think they are on the same node for some reason. Try adding -mca grpcomm_base_verbose 5 and let's see why

On Jul 3, 2012, at 1:24 PM, Juan Antonio Rico Gallego wrote:

> The code I run is a simple broadcast.
>
> When I do not specify components to run, the output is (more verbose):
>
> [jarico_at_Metropolis-01 examples]$ /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 -n 2 ./bmem
> [Metropolis-01:24490] mca: base: components_open: Looking for hwloc components
> [Metropolis-01:24490] mca: base: components_open: opening hwloc components
> [Metropolis-01:24490] mca: base: components_open: found loaded component hwloc142
> [Metropolis-01:24490] mca: base: components_open: component hwloc142 has no register function
> [Metropolis-01:24490] mca: base: components_open: component hwloc142 has no open function
> [Metropolis-01:24490] hwloc:base:get_topology
> [Metropolis-01:24490] hwloc:base: no cpus specified - using root available cpuset
>
> ======================== JOB MAP ========================
>
> Data for node: Metropolis-01 Num procs: 2
> Process OMPI jobid: [36336,1] App: 0 Process rank: 0
> Process OMPI jobid: [36336,1] App: 0 Process rank: 1
>
> =============================================================
> [Metropolis-01:24491] mca: base: components_open: Looking for hwloc components
> [Metropolis-01:24491] mca: base: components_open: opening hwloc components
> [Metropolis-01:24491] mca: base: components_open: found loaded component hwloc142
> [Metropolis-01:24491] mca: base: components_open: component hwloc142 has no register function
> [Metropolis-01:24491] mca: base: components_open: component hwloc142 has no open function
> [Metropolis-01:24492] mca: base: components_open: Looking for hwloc components
> [Metropolis-01:24492] mca: base: components_open: opening hwloc components
> [Metropolis-01:24492] mca: base: components_open: found loaded component hwloc142
> [Metropolis-01:24492] mca: base: components_open: component hwloc142 has no register function
> [Metropolis-01:24492] mca: base: components_open: component hwloc142 has no open function
> [Metropolis-01:24491] locality: CL:CU:N:B
> [Metropolis-01:24491] hwloc:base: get available cpus
> [Metropolis-01:24491] hwloc:base:get_available_cpus first time - filtering cpus
> [Metropolis-01:24491] hwloc:base: no cpus specified - using root available cpuset
> [Metropolis-01:24491] hwloc:base:get_available_cpus root object
> [Metropolis-01:24491] mca: base: components_open: Looking for coll components
> [Metropolis-01:24491] mca: base: components_open: opening coll components
> [Metropolis-01:24491] mca: base: components_open: found loaded component tuned
> [Metropolis-01:24491] mca: base: components_open: component tuned has no register function
> [Metropolis-01:24491] coll:tuned:component_open: done!
> [Metropolis-01:24491] mca: base: components_open: component tuned open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component sm
> [Metropolis-01:24491] mca: base: components_open: component sm register function successful
> [Metropolis-01:24491] mca: base: components_open: component sm has no open function
> [Metropolis-01:24491] mca: base: components_open: found loaded component libnbc
> [Metropolis-01:24491] mca: base: components_open: component libnbc register function successful
> [Metropolis-01:24491] mca: base: components_open: component libnbc open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component hierarch
> [Metropolis-01:24491] mca: base: components_open: component hierarch has no register function
> [Metropolis-01:24491] mca: base: components_open: component hierarch open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component basic
> [Metropolis-01:24491] mca: base: components_open: component basic register function successful
> [Metropolis-01:24491] mca: base: components_open: component basic has no open function
> [Metropolis-01:24491] mca: base: components_open: found loaded component inter
> [Metropolis-01:24491] mca: base: components_open: component inter has no register function
> [Metropolis-01:24491] mca: base: components_open: component inter open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component self
> [Metropolis-01:24491] mca: base: components_open: component self has no register function
> [Metropolis-01:24491] mca: base: components_open: component self open function successful
> [Metropolis-01:24492] locality: CL:CU:N:B
> [Metropolis-01:24492] hwloc:base: get available cpus
> [Metropolis-01:24492] hwloc:base:get_available_cpus first time - filtering cpus
> [Metropolis-01:24492] hwloc:base: no cpus specified - using root available cpuset
> [Metropolis-01:24492] hwloc:base:get_available_cpus root object
> [Metropolis-01:24492] mca: base: components_open: Looking for coll components
> [Metropolis-01:24492] mca: base: components_open: opening coll components
> [Metropolis-01:24492] mca: base: components_open: found loaded component tuned
> [Metropolis-01:24492] mca: base: components_open: component tuned has no register function
> [Metropolis-01:24492] coll:tuned:component_open: done!
> [Metropolis-01:24492] mca: base: components_open: component tuned open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component sm
> [Metropolis-01:24492] mca: base: components_open: component sm register function successful
> [Metropolis-01:24492] mca: base: components_open: component sm has no open function
> [Metropolis-01:24492] mca: base: components_open: found loaded component libnbc
> [Metropolis-01:24492] mca: base: components_open: component libnbc register function successful
> [Metropolis-01:24492] mca: base: components_open: component libnbc open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component hierarch
> [Metropolis-01:24492] mca: base: components_open: component hierarch has no register function
> [Metropolis-01:24492] mca: base: components_open: component hierarch open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component basic
> [Metropolis-01:24492] mca: base: components_open: component basic register function successful
> [Metropolis-01:24492] mca: base: components_open: component basic has no open function
> [Metropolis-01:24492] mca: base: components_open: found loaded component inter
> [Metropolis-01:24492] mca: base: components_open: component inter has no register function
> [Metropolis-01:24492] mca: base: components_open: component inter open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component self
> [Metropolis-01:24492] mca: base: components_open: component self has no register function
> [Metropolis-01:24492] mca: base: components_open: component self open function successful
> [Metropolis-01:24491] coll:find_available: querying coll component tuned
> [Metropolis-01:24491] coll:find_available: coll component tuned is available
> [Metropolis-01:24491] coll:find_available: querying coll component sm
> [Metropolis-01:24491] coll:sm:init_query: no other local procs; disqualifying myself
> [Metropolis-01:24491] coll:find_available: coll component sm is not available
> [Metropolis-01:24491] coll:find_available: querying coll component libnbc
> [Metropolis-01:24491] coll:find_available: coll component libnbc is available
> [Metropolis-01:24491] coll:find_available: querying coll component hierarch
> [Metropolis-01:24491] coll:find_available: coll component hierarch is available
> [Metropolis-01:24491] coll:find_available: querying coll component basic
> [Metropolis-01:24491] coll:find_available: coll component basic is available
> [Metropolis-01:24491] coll:find_available: querying coll component inter
> [Metropolis-01:24492] coll:find_available: querying coll component tuned
> [Metropolis-01:24492] coll:find_available: coll component tuned is available
> [Metropolis-01:24492] coll:find_available: querying coll component sm
> [Metropolis-01:24492] coll:sm:init_query: no other local procs; disqualifying myself
> [Metropolis-01:24492] coll:find_available: coll component sm is not available
> [Metropolis-01:24492] coll:find_available: querying coll component libnbc
> [Metropolis-01:24492] coll:find_available: coll component libnbc is available
> [Metropolis-01:24492] coll:find_available: querying coll component hierarch
> [Metropolis-01:24492] coll:find_available: coll component hierarch is available
> [Metropolis-01:24492] coll:find_available: querying coll component basic
> [Metropolis-01:24492] coll:find_available: coll component basic is available
> [Metropolis-01:24492] coll:find_available: querying coll component inter
> [Metropolis-01:24492] coll:find_available: coll component inter is available
> [Metropolis-01:24492] coll:find_available: querying coll component self
> [Metropolis-01:24492] coll:find_available: coll component self is available
> [Metropolis-01:24491] coll:find_available: coll component inter is available
> [Metropolis-01:24491] coll:find_available: querying coll component self
> [Metropolis-01:24491] coll:find_available: coll component self is available
> [Metropolis-01:24492] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
> [Metropolis-01:24491] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
> [Metropolis-01:24491] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
> [Metropolis-01:24491] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24491] coll:tuned:module_tuned query called
> [Metropolis-01:24491] coll:base:comm_select: component available: tuned, priority: 30
> [Metropolis-01:24491] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24491] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: inter
> [Metropolis-01:24491] coll:base:comm_select: component not available: self
> [Metropolis-01:24491] coll:tuned:module_init called.
> [Metropolis-01:24491] coll:tuned:module_init Tuned is in use
> [Metropolis-01:24491] coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
> [Metropolis-01:24491] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24491] coll:tuned:module_tuned query called
> [Metropolis-01:24491] coll:base:comm_select: component not available: tuned
> [Metropolis-01:24491] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24491] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: inter
> [Metropolis-01:24491] coll:base:comm_select: component available: self, priority: 75
> [Metropolis-01:24492] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
> [Metropolis-01:24492] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24492] coll:tuned:module_tuned query called
> [Metropolis-01:24492] coll:base:comm_select: component available: tuned, priority: 30
> [Metropolis-01:24492] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24492] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: inter
> [Metropolis-01:24492] coll:base:comm_select: component not available: self
> [Metropolis-01:24492] coll:tuned:module_init called.
> [Metropolis-01:24492] coll:tuned:module_init Tuned is in use
> [Metropolis-01:24492] coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
> [Metropolis-01:24492] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24492] coll:tuned:module_tuned query called
> [Metropolis-01:24492] coll:base:comm_select: component not available: tuned
> [Metropolis-01:24492] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24492] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: inter
> [Metropolis-01:24492] coll:base:comm_select: component available: self, priority: 75
> [Metropolis-01:24491] coll:tuned:component_close: called
> [Metropolis-01:24491] coll:tuned:component_close: done!
> [Metropolis-01:24492] coll:tuned:component_close: called
> [Metropolis-01:24492] coll:tuned:component_close: done!
> [Metropolis-01:24492] mca: base: close: component tuned closed
> [Metropolis-01:24492] mca: base: close: unloading component tuned
> [Metropolis-01:24492] mca: base: close: component libnbc closed
> [Metropolis-01:24492] mca: base: close: unloading component libnbc
> [Metropolis-01:24492] mca: base: close: unloading component hierarch
> [Metropolis-01:24492] mca: base: close: unloading component basic
> [Metropolis-01:24492] mca: base: close: unloading component inter
> [Metropolis-01:24492] mca: base: close: unloading component self
> [Metropolis-01:24491] mca: base: close: component tuned closed
> [Metropolis-01:24491] mca: base: close: unloading component tuned
> [Metropolis-01:24491] mca: base: close: component libnbc closed
> [Metropolis-01:24491] mca: base: close: unloading component libnbc
> [Metropolis-01:24491] mca: base: close: unloading component hierarch
> [Metropolis-01:24491] mca: base: close: unloading component basic
> [Metropolis-01:24491] mca: base: close: unloading component inter
> [Metropolis-01:24491] mca: base: close: unloading component self
> [jarico_at_Metropolis-01 examples]$
>
>
> SM is not load because it detects no other processes in the same machine:
>
> [Metropolis-01:24491] coll:sm:init_query: no other local procs; disqualifying myself
>
> The machine is a multicore machine with 8 cores.
>
> I need to run SM component code, and I suppose that raising priority it will be the component selected when problem is solved.
>
>
>
> El 03/07/2012, a las 21:01, Jeff Squyres escribió:
>
>> The issue is that the "sm" coll component only implements a few of the MPI collective operations. It is usually mixed at run-time with other coll components to fill out the rest of the MPI collective operations.
>>
>> So what is happening is that OMPI is determining that it doesn't have implementations of all the MPI collective operations and aborting.
>>
>> You shouldn't need to manually select your coll module -- OMPI should automatically select the right collective module for you. E.g., if all procs are local on a single machine and sm has a matching implementation for that MPI collective operation, it'll be used.
>>
>>
>>
>> On Jul 3, 2012, at 2:48 PM, Juan Antonio Rico Gallego wrote:
>>
>>> Output is:
>>>
>>> [Metropolis-01:15355] hwloc:base:get_topology
>>> [Metropolis-01:15355] hwloc:base: no cpus specified - using root available cpuset
>>>
>>> ======================== JOB MAP ========================
>>>
>>> Data for node: Metropolis-01 Num procs: 2
>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 0
>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 1
>>>
>>> =============================================================
>>> [Metropolis-01:15356] locality: CL:CU:N:B
>>> [Metropolis-01:15356] hwloc:base: get available cpus
>>> [Metropolis-01:15356] hwloc:base:get_available_cpus first time - filtering cpus
>>> [Metropolis-01:15356] hwloc:base: no cpus specified - using root available cpuset
>>> [Metropolis-01:15356] hwloc:base:get_available_cpus root object
>>> [Metropolis-01:15357] locality: CL:CU:N:B
>>> [Metropolis-01:15357] hwloc:base: get available cpus
>>> [Metropolis-01:15357] hwloc:base:get_available_cpus first time - filtering cpus
>>> [Metropolis-01:15357] hwloc:base: no cpus specified - using root available cpuset
>>> [Metropolis-01:15357] hwloc:base:get_available_cpus root object
>>> [Metropolis-01:15356] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
>>> [Metropolis-01:15357] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
>>>
>>>
>>> Regards,
>>> Juan A. Rico
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel