Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SM component init unload
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-07-03 15:44:57


Interesting - yes, coll sm doesn't think they are on the same node for some reason. Try adding -mca grpcomm_base_verbose 5 and let's see why

On Jul 3, 2012, at 1:24 PM, Juan Antonio Rico Gallego wrote:

> The code I run is a simple broadcast.
>
> When I do not specify components to run, the output is (more verbose):
>
> [jarico_at_Metropolis-01 examples]$ /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 -n 2 ./bmem
> [Metropolis-01:24490] mca: base: components_open: Looking for hwloc components
> [Metropolis-01:24490] mca: base: components_open: opening hwloc components
> [Metropolis-01:24490] mca: base: components_open: found loaded component hwloc142
> [Metropolis-01:24490] mca: base: components_open: component hwloc142 has no register function
> [Metropolis-01:24490] mca: base: components_open: component hwloc142 has no open function
> [Metropolis-01:24490] hwloc:base:get_topology
> [Metropolis-01:24490] hwloc:base: no cpus specified - using root available cpuset
>
> ======================== JOB MAP ========================
>
> Data for node: Metropolis-01 Num procs: 2
> Process OMPI jobid: [36336,1] App: 0 Process rank: 0
> Process OMPI jobid: [36336,1] App: 0 Process rank: 1
>
> =============================================================
> [Metropolis-01:24491] mca: base: components_open: Looking for hwloc components
> [Metropolis-01:24491] mca: base: components_open: opening hwloc components
> [Metropolis-01:24491] mca: base: components_open: found loaded component hwloc142
> [Metropolis-01:24491] mca: base: components_open: component hwloc142 has no register function
> [Metropolis-01:24491] mca: base: components_open: component hwloc142 has no open function
> [Metropolis-01:24492] mca: base: components_open: Looking for hwloc components
> [Metropolis-01:24492] mca: base: components_open: opening hwloc components
> [Metropolis-01:24492] mca: base: components_open: found loaded component hwloc142
> [Metropolis-01:24492] mca: base: components_open: component hwloc142 has no register function
> [Metropolis-01:24492] mca: base: components_open: component hwloc142 has no open function
> [Metropolis-01:24491] locality: CL:CU:N:B
> [Metropolis-01:24491] hwloc:base: get available cpus
> [Metropolis-01:24491] hwloc:base:get_available_cpus first time - filtering cpus
> [Metropolis-01:24491] hwloc:base: no cpus specified - using root available cpuset
> [Metropolis-01:24491] hwloc:base:get_available_cpus root object
> [Metropolis-01:24491] mca: base: components_open: Looking for coll components
> [Metropolis-01:24491] mca: base: components_open: opening coll components
> [Metropolis-01:24491] mca: base: components_open: found loaded component tuned
> [Metropolis-01:24491] mca: base: components_open: component tuned has no register function
> [Metropolis-01:24491] coll:tuned:component_open: done!
> [Metropolis-01:24491] mca: base: components_open: component tuned open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component sm
> [Metropolis-01:24491] mca: base: components_open: component sm register function successful
> [Metropolis-01:24491] mca: base: components_open: component sm has no open function
> [Metropolis-01:24491] mca: base: components_open: found loaded component libnbc
> [Metropolis-01:24491] mca: base: components_open: component libnbc register function successful
> [Metropolis-01:24491] mca: base: components_open: component libnbc open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component hierarch
> [Metropolis-01:24491] mca: base: components_open: component hierarch has no register function
> [Metropolis-01:24491] mca: base: components_open: component hierarch open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component basic
> [Metropolis-01:24491] mca: base: components_open: component basic register function successful
> [Metropolis-01:24491] mca: base: components_open: component basic has no open function
> [Metropolis-01:24491] mca: base: components_open: found loaded component inter
> [Metropolis-01:24491] mca: base: components_open: component inter has no register function
> [Metropolis-01:24491] mca: base: components_open: component inter open function successful
> [Metropolis-01:24491] mca: base: components_open: found loaded component self
> [Metropolis-01:24491] mca: base: components_open: component self has no register function
> [Metropolis-01:24491] mca: base: components_open: component self open function successful
> [Metropolis-01:24492] locality: CL:CU:N:B
> [Metropolis-01:24492] hwloc:base: get available cpus
> [Metropolis-01:24492] hwloc:base:get_available_cpus first time - filtering cpus
> [Metropolis-01:24492] hwloc:base: no cpus specified - using root available cpuset
> [Metropolis-01:24492] hwloc:base:get_available_cpus root object
> [Metropolis-01:24492] mca: base: components_open: Looking for coll components
> [Metropolis-01:24492] mca: base: components_open: opening coll components
> [Metropolis-01:24492] mca: base: components_open: found loaded component tuned
> [Metropolis-01:24492] mca: base: components_open: component tuned has no register function
> [Metropolis-01:24492] coll:tuned:component_open: done!
> [Metropolis-01:24492] mca: base: components_open: component tuned open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component sm
> [Metropolis-01:24492] mca: base: components_open: component sm register function successful
> [Metropolis-01:24492] mca: base: components_open: component sm has no open function
> [Metropolis-01:24492] mca: base: components_open: found loaded component libnbc
> [Metropolis-01:24492] mca: base: components_open: component libnbc register function successful
> [Metropolis-01:24492] mca: base: components_open: component libnbc open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component hierarch
> [Metropolis-01:24492] mca: base: components_open: component hierarch has no register function
> [Metropolis-01:24492] mca: base: components_open: component hierarch open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component basic
> [Metropolis-01:24492] mca: base: components_open: component basic register function successful
> [Metropolis-01:24492] mca: base: components_open: component basic has no open function
> [Metropolis-01:24492] mca: base: components_open: found loaded component inter
> [Metropolis-01:24492] mca: base: components_open: component inter has no register function
> [Metropolis-01:24492] mca: base: components_open: component inter open function successful
> [Metropolis-01:24492] mca: base: components_open: found loaded component self
> [Metropolis-01:24492] mca: base: components_open: component self has no register function
> [Metropolis-01:24492] mca: base: components_open: component self open function successful
> [Metropolis-01:24491] coll:find_available: querying coll component tuned
> [Metropolis-01:24491] coll:find_available: coll component tuned is available
> [Metropolis-01:24491] coll:find_available: querying coll component sm
> [Metropolis-01:24491] coll:sm:init_query: no other local procs; disqualifying myself
> [Metropolis-01:24491] coll:find_available: coll component sm is not available
> [Metropolis-01:24491] coll:find_available: querying coll component libnbc
> [Metropolis-01:24491] coll:find_available: coll component libnbc is available
> [Metropolis-01:24491] coll:find_available: querying coll component hierarch
> [Metropolis-01:24491] coll:find_available: coll component hierarch is available
> [Metropolis-01:24491] coll:find_available: querying coll component basic
> [Metropolis-01:24491] coll:find_available: coll component basic is available
> [Metropolis-01:24491] coll:find_available: querying coll component inter
> [Metropolis-01:24492] coll:find_available: querying coll component tuned
> [Metropolis-01:24492] coll:find_available: coll component tuned is available
> [Metropolis-01:24492] coll:find_available: querying coll component sm
> [Metropolis-01:24492] coll:sm:init_query: no other local procs; disqualifying myself
> [Metropolis-01:24492] coll:find_available: coll component sm is not available
> [Metropolis-01:24492] coll:find_available: querying coll component libnbc
> [Metropolis-01:24492] coll:find_available: coll component libnbc is available
> [Metropolis-01:24492] coll:find_available: querying coll component hierarch
> [Metropolis-01:24492] coll:find_available: coll component hierarch is available
> [Metropolis-01:24492] coll:find_available: querying coll component basic
> [Metropolis-01:24492] coll:find_available: coll component basic is available
> [Metropolis-01:24492] coll:find_available: querying coll component inter
> [Metropolis-01:24492] coll:find_available: coll component inter is available
> [Metropolis-01:24492] coll:find_available: querying coll component self
> [Metropolis-01:24492] coll:find_available: coll component self is available
> [Metropolis-01:24491] coll:find_available: coll component inter is available
> [Metropolis-01:24491] coll:find_available: querying coll component self
> [Metropolis-01:24491] coll:find_available: coll component self is available
> [Metropolis-01:24492] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
> [Metropolis-01:24491] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
> [Metropolis-01:24491] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
> [Metropolis-01:24491] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24491] coll:tuned:module_tuned query called
> [Metropolis-01:24491] coll:base:comm_select: component available: tuned, priority: 30
> [Metropolis-01:24491] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24491] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: inter
> [Metropolis-01:24491] coll:base:comm_select: component not available: self
> [Metropolis-01:24491] coll:tuned:module_init called.
> [Metropolis-01:24491] coll:tuned:module_init Tuned is in use
> [Metropolis-01:24491] coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
> [Metropolis-01:24491] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24491] coll:tuned:module_tuned query called
> [Metropolis-01:24491] coll:base:comm_select: component not available: tuned
> [Metropolis-01:24491] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24491] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24491] coll:base:comm_select: component not available: inter
> [Metropolis-01:24491] coll:base:comm_select: component available: self, priority: 75
> [Metropolis-01:24492] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
> [Metropolis-01:24492] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24492] coll:tuned:module_tuned query called
> [Metropolis-01:24492] coll:base:comm_select: component available: tuned, priority: 30
> [Metropolis-01:24492] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24492] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: inter
> [Metropolis-01:24492] coll:base:comm_select: component not available: self
> [Metropolis-01:24492] coll:tuned:module_init called.
> [Metropolis-01:24492] coll:tuned:module_init Tuned is in use
> [Metropolis-01:24492] coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
> [Metropolis-01:24492] coll:base:comm_select: Checking all available modules
> [Metropolis-01:24492] coll:tuned:module_tuned query called
> [Metropolis-01:24492] coll:base:comm_select: component not available: tuned
> [Metropolis-01:24492] coll:base:comm_select: component available: libnbc, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: hierarch
> [Metropolis-01:24492] coll:base:comm_select: component available: basic, priority: 10
> [Metropolis-01:24492] coll:base:comm_select: component not available: inter
> [Metropolis-01:24492] coll:base:comm_select: component available: self, priority: 75
> [Metropolis-01:24491] coll:tuned:component_close: called
> [Metropolis-01:24491] coll:tuned:component_close: done!
> [Metropolis-01:24492] coll:tuned:component_close: called
> [Metropolis-01:24492] coll:tuned:component_close: done!
> [Metropolis-01:24492] mca: base: close: component tuned closed
> [Metropolis-01:24492] mca: base: close: unloading component tuned
> [Metropolis-01:24492] mca: base: close: component libnbc closed
> [Metropolis-01:24492] mca: base: close: unloading component libnbc
> [Metropolis-01:24492] mca: base: close: unloading component hierarch
> [Metropolis-01:24492] mca: base: close: unloading component basic
> [Metropolis-01:24492] mca: base: close: unloading component inter
> [Metropolis-01:24492] mca: base: close: unloading component self
> [Metropolis-01:24491] mca: base: close: component tuned closed
> [Metropolis-01:24491] mca: base: close: unloading component tuned
> [Metropolis-01:24491] mca: base: close: component libnbc closed
> [Metropolis-01:24491] mca: base: close: unloading component libnbc
> [Metropolis-01:24491] mca: base: close: unloading component hierarch
> [Metropolis-01:24491] mca: base: close: unloading component basic
> [Metropolis-01:24491] mca: base: close: unloading component inter
> [Metropolis-01:24491] mca: base: close: unloading component self
> [jarico_at_Metropolis-01 examples]$
>
>
> SM is not load because it detects no other processes in the same machine:
>
> [Metropolis-01:24491] coll:sm:init_query: no other local procs; disqualifying myself
>
> The machine is a multicore machine with 8 cores.
>
> I need to run SM component code, and I suppose that raising priority it will be the component selected when problem is solved.
>
>
>
> El 03/07/2012, a las 21:01, Jeff Squyres escribió:
>
>> The issue is that the "sm" coll component only implements a few of the MPI collective operations. It is usually mixed at run-time with other coll components to fill out the rest of the MPI collective operations.
>>
>> So what is happening is that OMPI is determining that it doesn't have implementations of all the MPI collective operations and aborting.
>>
>> You shouldn't need to manually select your coll module -- OMPI should automatically select the right collective module for you. E.g., if all procs are local on a single machine and sm has a matching implementation for that MPI collective operation, it'll be used.
>>
>>
>>
>> On Jul 3, 2012, at 2:48 PM, Juan Antonio Rico Gallego wrote:
>>
>>> Output is:
>>>
>>> [Metropolis-01:15355] hwloc:base:get_topology
>>> [Metropolis-01:15355] hwloc:base: no cpus specified - using root available cpuset
>>>
>>> ======================== JOB MAP ========================
>>>
>>> Data for node: Metropolis-01 Num procs: 2
>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 0
>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 1
>>>
>>> =============================================================
>>> [Metropolis-01:15356] locality: CL:CU:N:B
>>> [Metropolis-01:15356] hwloc:base: get available cpus
>>> [Metropolis-01:15356] hwloc:base:get_available_cpus first time - filtering cpus
>>> [Metropolis-01:15356] hwloc:base: no cpus specified - using root available cpuset
>>> [Metropolis-01:15356] hwloc:base:get_available_cpus root object
>>> [Metropolis-01:15357] locality: CL:CU:N:B
>>> [Metropolis-01:15357] hwloc:base: get available cpus
>>> [Metropolis-01:15357] hwloc:base:get_available_cpus first time - filtering cpus
>>> [Metropolis-01:15357] hwloc:base: no cpus specified - using root available cpuset
>>> [Metropolis-01:15357] hwloc:base:get_available_cpus root object
>>> [Metropolis-01:15356] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
>>> [Metropolis-01:15357] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
>>>
>>>
>>> Regards,
>>> Juan A. Rico
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel