Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SM component init unload
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-07-03 22:46:56


Good catch, George - thanks for the detailed explanation. I think what
happened here was that we changed the ordering in MPI_Init a while back -
we had always had a rule about when MPI components could access remote proc
info, but had grown lax about it, so at least some of the BTLs had to be
fixed at that time. Since nobody really uses SM coll, the fact that it also
violated the sequencing rules went undetected.

I'll work with Jeff to fix the initialization sequencing. I can't speak to
your advice about use in production as you are definitely the expert in
this area, but we should at least get it to work for those who want to play
with it.

Ralph

On Tue, Jul 3, 2012 at 6:59 PM, George Bosilca <bosilca_at_[hidden]> wrote:

> Juan,
>
> Something weird is going on there. The selection mechanism for the SM coll
> and SM BTL should be very similar. However, the SM BTL successfully select
> itself while the SM coll fails to determine that all processes are local.
>
> In the coll SM the issue is that the remote procs do not have the LOCAL
> flag set, even when they are on the local node (however the
> ompi_proc_local() return has a special flag stating that all processes in
> the job are local). I compared the initialization of the SM BTL and the SM
> coll. It turns out that somehow the procs returned by ompi_proc_all() and
> the procs provided to the add_proc of the BTLs are not identical. The
> second have the local flag correctly set, so I went a little bit deeper.
>
> Here is what I found while toying with gdb inside:
>
> breakpoint 1, mca_coll_sm_init_query (enable_progress_threads=false,
> enable_mpi_threads=false) at coll_sm_module.c:132
>
> (gdb) p procs[0]
> $1 = (ompi_proc_t *) 0x109a1e8c0
> (gdb) p procs[1]
> $2 = (ompi_proc_t *) 0x109a1e970
> (gdb) p procs[0]->proc_flags
> $3 = 0
> (gdb) p procs[1]->proc_flags
> $4 = 4095
>
> Breakpoint 2, mca_btl_sm_add_procs (btl=0x109baa1c0, nprocs=2,
> procs=0x109a319e0, peers=0x109a319f0, reachability=0x7fff691378e8) at
> btl_sm.c:427
>
> (gdb) p procs[0]
> $5 = (struct ompi_proc_t *) 0x109a1e8c0
> (gdb) p procs[1]
> $6 = (struct ompi_proc_t *) 0x109a1e970
> (gdb) p procs[0]->proc_flags
> $7 = 1920
> (gdb) p procs[1]->proc_flags
> $8 = 4095
>
> Thus the problem seems to come from the fact that during the
> initialization of the SM coll the flags are not correctly set. However,
> this is somehow expected … as the call to the initialization happens before
> the exchange of the business cards (and therefore there is no way to have
> any knowledge about the remote procs).
>
> So, either something changed drastically in the way we set the flags for
> remote processes or we did not use the SM coll for the last 3 years. I
> think the culprit is r21967 (
> https://svn.open-mpi.org/trac/ompi/changeset/21967) who added a
> "selection" logic based on knowledge about remote procs in the coll SM
> initialization function. But this selection logic was way to early !!!
>
> I would strongly encourage you not to use this SM collective component in
> anything related to production runs.
>
> george.
>
> PS: However, if you want to toy with the SM coll apply the following patch:
> Index: coll_sm_module.c
> ===================================================================
> --- coll_sm_module.c (revision 26737)
> +++ coll_sm_module.c (working copy)
> @@ -128,6 +128,7 @@
> int mca_coll_sm_init_query(bool enable_progress_threads,
> bool enable_mpi_threads)
> {
> +#if 0
> ompi_proc_t *my_proc, **procs;
> size_t i, size;
>
> @@ -158,7 +159,7 @@
> "coll:sm:init_query: no other local procs;
> disqualifying myself");
> return OMPI_ERR_NOT_AVAILABLE;
> }
> -
> +#endif
> /* Don't do much here because we don't really want to allocate any
> shared memory until this component is selected to be used. */
> opal_output_verbose(10, mca_coll_base_output,
>
>
>
>
>
> On Jul 4, 2012, at 02:05 , Ralph Castain wrote:
>
> Okay, please try this again with r26739 or above. You can remove the rest
> of the "verbose" settings and the --display-map so we declutter the output.
> Please add "-mca orte_nidmap_verbose 20" to your cmd line.
>
> Thanks!
> Ralph
>
>
> On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico <jarico_at_[hidden]> wrote:
>
>> Here is the output.
>>
>> [jarico_at_Metropolis-01 examples]$
>> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core
>> --bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca
>> coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca
>> mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 -n 2
>> -mca grpcomm_base_verbose 5 ./bmem
>> [Metropolis-01:24563] mca: base: components_open: Looking for hwloc
>> components
>> [Metropolis-01:24563] mca: base: components_open: opening hwloc components
>> [Metropolis-01:24563] mca: base: components_open: found loaded component
>> hwloc142
>> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has
>> no register function
>> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has
>> no open function
>> [Metropolis-01:24563] hwloc:base:get_topology
>> [Metropolis-01:24563] hwloc:base: no cpus specified - using root
>> available cpuset
>> [Metropolis-01:24563] mca:base:select:(grpcomm) Querying component [bad]
>> [Metropolis-01:24563] mca:base:select:(grpcomm) Query of component [bad]
>> set priority to 10
>> [Metropolis-01:24563] mca:base:select:(grpcomm) Selected component [bad]
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:receive start comm
>> --------------------------------------------------------------------------
>> WARNING: a request was made to bind a process. While the system
>> supports binding the process itself, at least one node does NOT
>> support binding memory to the process location.
>>
>> Node: Metropolis-01
>>
>> This is a warning only; your job will continue, though performance may
>> be degraded.
>> --------------------------------------------------------------------------
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base:get_nbojbs computed data 8 of Core:0
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24563] hwloc:base: get available cpus
>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done
>>
>> ======================== JOB MAP ========================
>>
>> Data for node: Metropolis-01 Num procs: 2
>> Process OMPI jobid: [36265,1] App: 0 Process rank: 0
>> Process OMPI jobid: [36265,1] App: 0 Process rank: 1
>>
>> =============================================================
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job
>> [36265,0] tag 1
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:xcast updating daemon
>> nidmap
>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient
>> list is empty!
>> [Metropolis-01:24564] mca: base: components_open: Looking for hwloc
>> components
>> [Metropolis-01:24564] mca: base: components_open: opening hwloc components
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> hwloc142
>> [Metropolis-01:24564] mca: base: components_open: component hwloc142 has
>> no register function
>> [Metropolis-01:24564] mca: base: components_open: component hwloc142 has
>> no open function
>> [Metropolis-01:24565] mca: base: components_open: Looking for hwloc
>> components
>> [Metropolis-01:24565] mca: base: components_open: opening hwloc components
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> hwloc142
>> [Metropolis-01:24565] mca: base: components_open: component hwloc142 has
>> no register function
>> [Metropolis-01:24565] mca: base: components_open: component hwloc142 has
>> no open function
>> [Metropolis-01:24564] mca:base:select:(grpcomm) Querying component [bad]
>> [Metropolis-01:24564] mca:base:select:(grpcomm) Query of component [bad]
>> set priority to 10
>> [Metropolis-01:24564] mca:base:select:(grpcomm) Selected component [bad]
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive start comm
>> [Metropolis-01:24564] computing locality - getting object at level CORE,
>> index 0
>> [Metropolis-01:24564] hwloc:base: get available cpus
>> [Metropolis-01:24564] hwloc:base:get_available_cpus first time -
>> filtering cpus
>> [Metropolis-01:24564] hwloc:base: no cpus specified - using root
>> available cpuset
>> [Metropolis-01:24564] computing locality - getting object at level CORE,
>> index 1
>> [Metropolis-01:24564] hwloc:base: get available cpus
>> [Metropolis-01:24564] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24564] computing locality - shifting up from L1CACHE
>> [Metropolis-01:24564] computing locality - shifting up from L2CACHE
>> [Metropolis-01:24564] computing locality - shifting up from L3CACHE
>> [Metropolis-01:24564] computing locality - filling level SOCKET
>> [Metropolis-01:24564] computing locality - filling level NUMA
>> [Metropolis-01:24564] locality: CL:CU:N:B:Nu:S
>> [Metropolis-01:24565] mca:base:select:(grpcomm) Querying component [bad]
>> [Metropolis-01:24565] mca:base:select:(grpcomm) Query of component [bad]
>> set priority to 10
>> [Metropolis-01:24565] mca:base:select:(grpcomm) Selected component [bad]
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive start comm
>> [Metropolis-01:24564] mca: base: components_open: Looking for coll
>> components
>> [Metropolis-01:24564] mca: base: components_open: opening coll components
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> tuned
>> [Metropolis-01:24564] mca: base: components_open: component tuned has no
>> register function
>> [Metropolis-01:24564] coll:tuned:component_open: done!
>> [Metropolis-01:24564] mca: base: components_open: component tuned open
>> function successful
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> sm
>> [Metropolis-01:24564] mca: base: components_open: component sm register
>> function successful
>> [Metropolis-01:24564] mca: base: components_open: component sm has no
>> open function
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> libnbc
>> [Metropolis-01:24564] mca: base: components_open: component libnbc
>> register function successful
>> [Metropolis-01:24564] mca: base: components_open: component libnbc open
>> function successful
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> hierarch
>> [Metropolis-01:24564] mca: base: components_open: component hierarch has
>> no register function
>> [Metropolis-01:24564] mca: base: components_open: component hierarch open
>> function successful
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> basic
>> [Metropolis-01:24564] mca: base: components_open: component basic
>> register function successful
>> [Metropolis-01:24564] mca: base: components_open: component basic has no
>> open function
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> inter
>> [Metropolis-01:24564] mca: base: components_open: component inter has no
>> register function
>> [Metropolis-01:24564] mca: base: components_open: component inter open
>> function successful
>> [Metropolis-01:24564] mca: base: components_open: found loaded component
>> self
>> [Metropolis-01:24564] mca: base: components_open: component self has no
>> register function
>> [Metropolis-01:24564] mca: base: components_open: component self open
>> function successful
>> [Metropolis-01:24565] computing locality - getting object at level CORE,
>> index 1
>> [Metropolis-01:24565] hwloc:base: get available cpus
>> [Metropolis-01:24565] hwloc:base:get_available_cpus first time -
>> filtering cpus
>> [Metropolis-01:24565] hwloc:base: no cpus specified - using root
>> available cpuset
>> [Metropolis-01:24565] hwloc:base: get available cpus
>> [Metropolis-01:24565] hwloc:base:filter_cpus specified - already done
>> [Metropolis-01:24565] computing locality - getting object at level CORE,
>> index 0
>> [Metropolis-01:24565] computing locality - shifting up from L1CACHE
>> [Metropolis-01:24565] computing locality - shifting up from L2CACHE
>> [Metropolis-01:24565] computing locality - shifting up from L3CACHE
>> [Metropolis-01:24565] computing locality - filling level SOCKET
>> [Metropolis-01:24565] computing locality - filling level NUMA
>> [Metropolis-01:24565] locality: CL:CU:N:B:Nu:S
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0]
>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 0
>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO
>> PARTICIPANTS
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 0
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 0
>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:modex: performing modex
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:pack_modex: reporting 4
>> entries
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:full:modex: executing
>> allgather
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering allgather
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad allgather underway
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:modex: modex posted
>> [Metropolis-01:24565] mca: base: components_open: Looking for coll
>> components
>> [Metropolis-01:24565] mca: base: components_open: opening coll components
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> tuned
>> [Metropolis-01:24565] mca: base: components_open: component tuned has no
>> register function
>> [Metropolis-01:24565] coll:tuned:component_open: done!
>> [Metropolis-01:24565] mca: base: components_open: component tuned open
>> function successful
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> sm
>> [Metropolis-01:24565] mca: base: components_open: component sm register
>> function successful
>> [Metropolis-01:24565] mca: base: components_open: component sm has no
>> open function
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> libnbc
>> [Metropolis-01:24565] mca: base: components_open: component libnbc
>> register function successful
>> [Metropolis-01:24565] mca: base: components_open: component libnbc open
>> function successful
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> hierarch
>> [Metropolis-01:24565] mca: base: components_open: component hierarch has
>> no register function
>> [Metropolis-01:24565] mca: base: components_open: component hierarch open
>> function successful
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> basic
>> [Metropolis-01:24565] mca: base: components_open: component basic
>> register function successful
>> [Metropolis-01:24565] mca: base: components_open: component basic has no
>> open function
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> inter
>> [Metropolis-01:24565] mca: base: components_open: component inter has no
>> register function
>> [Metropolis-01:24565] mca: base: components_open: component inter open
>> function successful
>> [Metropolis-01:24565] mca: base: components_open: found loaded component
>> self
>> [Metropolis-01:24565] mca: base: components_open: component self has no
>> register function
>> [Metropolis-01:24565] mca: base: components_open: component self open
>> function successful
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1]
>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 0
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 0
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 0
>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 0 LOCALLY COMPLETE -
>> SENDING TO GLOBAL COLLECTIVE
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon
>> collective recvd from [[36265,0],0]
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING
>> COLLECTIVE 0
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM
>> CONTRIBS: 2
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job
>> [36265,1] tag 30
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay
>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient
>> list is empty!
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:modex: performing modex
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:pack_modex: reporting 4
>> entries
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:full:modex: executing
>> allgather
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering allgather
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad allgather underway
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:modex: modex posted
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing
>> collective return for id 0
>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 0
>> [Metropolis-01:24564] [[36265,1],0] STORING MODEX DATA
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:store_modex adding modex
>> entry for proc [[36265,1],0]
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing
>> collective return for id 0
>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 0
>> [Metropolis-01:24565] [[36265,1],1] STORING MODEX DATA
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:store_modex adding modex
>> entry for proc [[36265,1],0]
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:update_modex_entries:
>> adding 4 entries for proc [[36265,1],0]
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:store_modex adding modex
>> entry for proc [[36265,1],1]
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:update_modex_entries:
>> adding 4 entries for proc [[36265,1],1]
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:update_modex_entries:
>> adding 4 entries for proc [[36265,1],0]
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:store_modex adding modex
>> entry for proc [[36265,1],1]
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:update_modex_entries:
>> adding 4 entries for proc [[36265,1],1]
>> [Metropolis-01:24564] coll:find_available: querying coll component tuned
>> [Metropolis-01:24564] coll:find_available: coll component tuned is
>> available
>> [Metropolis-01:24565] coll:find_available: querying coll component tuned
>> [Metropolis-01:24565] coll:find_available: coll component tuned is
>> available
>> [Metropolis-01:24565] coll:find_available: querying coll component sm
>> [Metropolis-01:24564] coll:find_available: querying coll component sm
>> [Metropolis-01:24564] coll:sm:init_query: no other local procs;
>> disqualifying myself
>> [Metropolis-01:24564] coll:find_available: coll component sm is not
>> available
>> [Metropolis-01:24564] coll:find_available: querying coll component libnbc
>> [Metropolis-01:24564] coll:find_available: coll component libnbc is
>> available
>> [Metropolis-01:24564] coll:find_available: querying coll component
>> hierarch
>> [Metropolis-01:24564] coll:find_available: coll component hierarch is
>> available
>> [Metropolis-01:24564] coll:find_available: querying coll component basic
>> [Metropolis-01:24564] coll:find_available: coll component basic is
>> available
>> [Metropolis-01:24565] coll:sm:init_query: no other local procs;
>> disqualifying myself
>> [Metropolis-01:24565] coll:find_available: coll component sm is not
>> available
>> [Metropolis-01:24565] coll:find_available: querying coll component libnbc
>> [Metropolis-01:24565] coll:find_available: coll component libnbc is
>> available
>> [Metropolis-01:24565] coll:find_available: querying coll component
>> hierarch
>> [Metropolis-01:24565] coll:find_available: coll component hierarch is
>> available
>> [Metropolis-01:24565] coll:find_available: querying coll component basic
>> [Metropolis-01:24565] coll:find_available: coll component basic is
>> available
>> [Metropolis-01:24564] coll:find_available: querying coll component inter
>> [Metropolis-01:24564] coll:find_available: coll component inter is
>> available
>> [Metropolis-01:24564] coll:find_available: querying coll component self
>> [Metropolis-01:24564] coll:find_available: coll component self is
>> available
>> [Metropolis-01:24565] coll:find_available: querying coll component inter
>> [Metropolis-01:24565] coll:find_available: coll component inter is
>> available
>> [Metropolis-01:24565] coll:find_available: querying coll component self
>> [Metropolis-01:24565] coll:find_available: coll component self is
>> available
>> [Metropolis-01:24565] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
>> [Metropolis-01:24564] hwloc:base:get_nbojbs computed data 0 of NUMANode:0
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1]
>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 1
>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO
>> PARTICIPANTS
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 1
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 1
>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0]
>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 1
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 1
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 1
>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 1 LOCALLY COMPLETE -
>> SENDING TO GLOBAL COLLECTIVE
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon
>> collective recvd from [[36265,0],0]
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING
>> COLLECTIVE 1
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM
>> CONTRIBS: 2
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job
>> [36265,1] tag 30
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay
>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient
>> list is empty!
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering barrier
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad barrier underway
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering barrier
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad barrier underway
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing
>> collective return for id 1
>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 1
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing
>> collective return for id 1
>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 1
>> [Metropolis-01:24565] coll:base:comm_select: new communicator:
>> MPI_COMM_WORLD (cid 0)
>> [Metropolis-01:24565] coll:base:comm_select: Checking all available
>> modules
>> [Metropolis-01:24565] coll:tuned:module_tuned query called
>> [Metropolis-01:24565] coll:base:comm_select: component available: tuned,
>> priority: 30
>> [Metropolis-01:24565] coll:base:comm_select: component available: libnbc,
>> priority: 10
>> [Metropolis-01:24565] coll:base:comm_select: component not available:
>> hierarch
>> [Metropolis-01:24565] coll:base:comm_select: component available: basic,
>> priority: 10
>> [Metropolis-01:24565] coll:base:comm_select: component not available:
>> inter
>> [Metropolis-01:24565] coll:base:comm_select: component not available: self
>> [Metropolis-01:24565] coll:tuned:module_init called.
>> [Metropolis-01:24565] coll:tuned:module_init Tuned is in use
>> [Metropolis-01:24565] coll:base:comm_select: new communicator:
>> MPI_COMM_SELF (cid 1)
>> [Metropolis-01:24565] coll:base:comm_select: Checking all available
>> modules
>> [Metropolis-01:24564] coll:base:comm_select: new communicator:
>> MPI_COMM_WORLD (cid 0)
>> [Metropolis-01:24564] coll:base:comm_select: Checking all available
>> modules
>> [Metropolis-01:24564] coll:tuned:module_tuned query called
>> [Metropolis-01:24564] coll:base:comm_select: component available: tuned,
>> priority: 30
>> [Metropolis-01:24564] coll:base:comm_select: component available: libnbc,
>> priority: 10
>> [Metropolis-01:24564] coll:base:comm_select: component not available:
>> hierarch
>> [Metropolis-01:24564] coll:base:comm_select: component available: basic,
>> priority: 10
>> [Metropolis-01:24564] coll:base:comm_select: component not available:
>> inter
>> [Metropolis-01:24564] coll:base:comm_select: component not available: self
>> [Metropolis-01:24564] coll:tuned:module_init called.
>> [Metropolis-01:24565] coll:tuned:module_tuned query called
>> [Metropolis-01:24565] coll:base:comm_select: component not available:
>> tuned
>> [Metropolis-01:24565] coll:base:comm_select: component available: libnbc,
>> priority: 10
>> [Metropolis-01:24565] coll:base:comm_select: component not available:
>> hierarch
>> [Metropolis-01:24565] coll:base:comm_select: component available: basic,
>> priority: 10
>> [Metropolis-01:24565] coll:base:comm_select: component not available:
>> inter
>> [Metropolis-01:24565] coll:base:comm_select: component available: self,
>> priority: 75
>> [Metropolis-01:24564] coll:tuned:module_init Tuned is in use
>> [Metropolis-01:24564] coll:base:comm_select: new communicator:
>> MPI_COMM_SELF (cid 1)
>> [Metropolis-01:24564] coll:base:comm_select: Checking all available
>> modules
>> [Metropolis-01:24564] coll:tuned:module_tuned query called
>> [Metropolis-01:24564] coll:base:comm_select: component not available:
>> tuned
>> [Metropolis-01:24564] coll:base:comm_select: component available: libnbc,
>> priority: 10
>> [Metropolis-01:24564] coll:base:comm_select: component not available:
>> hierarch
>> [Metropolis-01:24564] coll:base:comm_select: component available: basic,
>> priority: 10
>> [Metropolis-01:24564] coll:base:comm_select: component not available:
>> inter
>> [Metropolis-01:24564] coll:base:comm_select: component available: self,
>> priority: 75
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering barrier
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1]
>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 2
>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO
>> PARTICIPANTS
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 2
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 2
>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0]
>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 2
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 2
>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 2
>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2
>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 2 LOCALLY COMPLETE -
>> SENDING TO GLOBAL COLLECTIVE
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon
>> collective recvd from [[36265,0],0]
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING
>> COLLECTIVE 2
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM
>> CONTRIBS: 2
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job
>> [36265,1] tag 30
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay
>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient
>> list is empty!
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering barrier
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad barrier underway
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing
>> collective return for id 2
>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 2
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad barrier underway
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing
>> collective return for id 2
>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 2
>> [Metropolis-01:24565] coll:tuned:component_close: called
>> [Metropolis-01:24565] coll:tuned:component_close: done!
>> [Metropolis-01:24565] mca: base: close: component tuned closed
>> [Metropolis-01:24565] mca: base: close: unloading component tuned
>> [Metropolis-01:24565] mca: base: close: component libnbc closed
>> [Metropolis-01:24565] mca: base: close: unloading component libnbc
>> [Metropolis-01:24565] mca: base: close: unloading component hierarch
>> [Metropolis-01:24565] mca: base: close: unloading component basic
>> [Metropolis-01:24565] mca: base: close: unloading component inter
>> [Metropolis-01:24565] mca: base: close: unloading component self
>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive stop comm
>> [Metropolis-01:24564] coll:tuned:component_close: called
>> [Metropolis-01:24564] coll:tuned:component_close: done!
>> [Metropolis-01:24564] mca: base: close: component tuned closed
>> [Metropolis-01:24564] mca: base: close: unloading component tuned
>> [Metropolis-01:24564] mca: base: close: component libnbc closed
>> [Metropolis-01:24564] mca: base: close: unloading component libnbc
>> [Metropolis-01:24564] mca: base: close: unloading component hierarch
>> [Metropolis-01:24564] mca: base: close: unloading component basic
>> [Metropolis-01:24564] mca: base: close: unloading component inter
>> [Metropolis-01:24564] mca: base: close: unloading component self
>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive stop comm
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job
>> [36265,0] tag 1
>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay
>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient
>> list is empty!
>> [jarico_at_Metropolis-01 examples]$
>>
>>
>>
>> El 03/07/2012, a las 21:44, Ralph Castain escribió:
>>
>> > Interesting - yes, coll sm doesn't think they are on the same node for
>> some reason. Try adding -mca grpcomm_base_verbose 5 and let's see why
>> >
>> >
>> > On Jul 3, 2012, at 1:24 PM, Juan Antonio Rico Gallego wrote:
>> >
>> >> The code I run is a simple broadcast.
>> >>
>> >> When I do not specify components to run, the output is (more verbose):
>> >>
>> >> [jarico_at_Metropolis-01 examples]$
>> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --mca
>> mca_base_verbose 100 --mca mca_coll_base_output 100 --mca coll_sm_priority
>> 99 -mca hwloc_base_verbose 90 --display-map --mca mca_verbose 100 --mca
>> mca_base_verbose 100 --mca coll_base_verbose 100 -n 2 ./bmem
>> >> [Metropolis-01:24490] mca: base: components_open: Looking for hwloc
>> components
>> >> [Metropolis-01:24490] mca: base: components_open: opening hwloc
>> components
>> >> [Metropolis-01:24490] mca: base: components_open: found loaded
>> component hwloc142
>> >> [Metropolis-01:24490] mca: base: components_open: component hwloc142
>> has no register function
>> >> [Metropolis-01:24490] mca: base: components_open: component hwloc142
>> has no open function
>> >> [Metropolis-01:24490] hwloc:base:get_topology
>> >> [Metropolis-01:24490] hwloc:base: no cpus specified - using root
>> available cpuset
>> >>
>> >> ======================== JOB MAP ========================
>> >>
>> >> Data for node: Metropolis-01 Num procs: 2
>> >> Process OMPI jobid: [36336,1] App: 0 Process rank: 0
>> >> Process OMPI jobid: [36336,1] App: 0 Process rank: 1
>> >>
>> >> =============================================================
>> >> [Metropolis-01:24491] mca: base: components_open: Looking for hwloc
>> components
>> >> [Metropolis-01:24491] mca: base: components_open: opening hwloc
>> components
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component hwloc142
>> >> [Metropolis-01:24491] mca: base: components_open: component hwloc142
>> has no register function
>> >> [Metropolis-01:24491] mca: base: components_open: component hwloc142
>> has no open function
>> >> [Metropolis-01:24492] mca: base: components_open: Looking for hwloc
>> components
>> >> [Metropolis-01:24492] mca: base: components_open: opening hwloc
>> components
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component hwloc142
>> >> [Metropolis-01:24492] mca: base: components_open: component hwloc142
>> has no register function
>> >> [Metropolis-01:24492] mca: base: components_open: component hwloc142
>> has no open function
>> >> [Metropolis-01:24491] locality: CL:CU:N:B
>> >> [Metropolis-01:24491] hwloc:base: get available cpus
>> >> [Metropolis-01:24491] hwloc:base:get_available_cpus first time -
>> filtering cpus
>> >> [Metropolis-01:24491] hwloc:base: no cpus specified - using root
>> available cpuset
>> >> [Metropolis-01:24491] hwloc:base:get_available_cpus root object
>> >> [Metropolis-01:24491] mca: base: components_open: Looking for coll
>> components
>> >> [Metropolis-01:24491] mca: base: components_open: opening coll
>> components
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component tuned
>> >> [Metropolis-01:24491] mca: base: components_open: component tuned has
>> no register function
>> >> [Metropolis-01:24491] coll:tuned:component_open: done!
>> >> [Metropolis-01:24491] mca: base: components_open: component tuned open
>> function successful
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component sm
>> >> [Metropolis-01:24491] mca: base: components_open: component sm
>> register function successful
>> >> [Metropolis-01:24491] mca: base: components_open: component sm has no
>> open function
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component libnbc
>> >> [Metropolis-01:24491] mca: base: components_open: component libnbc
>> register function successful
>> >> [Metropolis-01:24491] mca: base: components_open: component libnbc
>> open function successful
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component hierarch
>> >> [Metropolis-01:24491] mca: base: components_open: component hierarch
>> has no register function
>> >> [Metropolis-01:24491] mca: base: components_open: component hierarch
>> open function successful
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component basic
>> >> [Metropolis-01:24491] mca: base: components_open: component basic
>> register function successful
>> >> [Metropolis-01:24491] mca: base: components_open: component basic has
>> no open function
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component inter
>> >> [Metropolis-01:24491] mca: base: components_open: component inter has
>> no register function
>> >> [Metropolis-01:24491] mca: base: components_open: component inter open
>> function successful
>> >> [Metropolis-01:24491] mca: base: components_open: found loaded
>> component self
>> >> [Metropolis-01:24491] mca: base: components_open: component self has
>> no register function
>> >> [Metropolis-01:24491] mca: base: components_open: component self open
>> function successful
>> >> [Metropolis-01:24492] locality: CL:CU:N:B
>> >> [Metropolis-01:24492] hwloc:base: get available cpus
>> >> [Metropolis-01:24492] hwloc:base:get_available_cpus first time -
>> filtering cpus
>> >> [Metropolis-01:24492] hwloc:base: no cpus specified - using root
>> available cpuset
>> >> [Metropolis-01:24492] hwloc:base:get_available_cpus root object
>> >> [Metropolis-01:24492] mca: base: components_open: Looking for coll
>> components
>> >> [Metropolis-01:24492] mca: base: components_open: opening coll
>> components
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component tuned
>> >> [Metropolis-01:24492] mca: base: components_open: component tuned has
>> no register function
>> >> [Metropolis-01:24492] coll:tuned:component_open: done!
>> >> [Metropolis-01:24492] mca: base: components_open: component tuned open
>> function successful
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component sm
>> >> [Metropolis-01:24492] mca: base: components_open: component sm
>> register function successful
>> >> [Metropolis-01:24492] mca: base: components_open: component sm has no
>> open function
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component libnbc
>> >> [Metropolis-01:24492] mca: base: components_open: component libnbc
>> register function successful
>> >> [Metropolis-01:24492] mca: base: components_open: component libnbc
>> open function successful
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component hierarch
>> >> [Metropolis-01:24492] mca: base: components_open: component hierarch
>> has no register function
>> >> [Metropolis-01:24492] mca: base: components_open: component hierarch
>> open function successful
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component basic
>> >> [Metropolis-01:24492] mca: base: components_open: component basic
>> register function successful
>> >> [Metropolis-01:24492] mca: base: components_open: component basic has
>> no open function
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component inter
>> >> [Metropolis-01:24492] mca: base: components_open: component inter has
>> no register function
>> >> [Metropolis-01:24492] mca: base: components_open: component inter open
>> function successful
>> >> [Metropolis-01:24492] mca: base: components_open: found loaded
>> component self
>> >> [Metropolis-01:24492] mca: base: components_open: component self has
>> no register function
>> >> [Metropolis-01:24492] mca: base: components_open: component self open
>> function successful
>> >> [Metropolis-01:24491] coll:find_available: querying coll component
>> tuned
>> >> [Metropolis-01:24491] coll:find_available: coll component tuned is
>> available
>> >> [Metropolis-01:24491] coll:find_available: querying coll component sm
>> >> [Metropolis-01:24491] coll:sm:init_query: no other local procs;
>> disqualifying myself
>> >> [Metropolis-01:24491] coll:find_available: coll component sm is not
>> available
>> >> [Metropolis-01:24491] coll:find_available: querying coll component
>> libnbc
>> >> [Metropolis-01:24491] coll:find_available: coll component libnbc is
>> available
>> >> [Metropolis-01:24491] coll:find_available: querying coll component
>> hierarch
>> >> [Metropolis-01:24491] coll:find_available: coll component hierarch is
>> available
>> >> [Metropolis-01:24491] coll:find_available: querying coll component
>> basic
>> >> [Metropolis-01:24491] coll:find_available: coll component basic is
>> available
>> >> [Metropolis-01:24491] coll:find_available: querying coll component
>> inter
>> >> [Metropolis-01:24492] coll:find_available: querying coll component
>> tuned
>> >> [Metropolis-01:24492] coll:find_available: coll component tuned is
>> available
>> >> [Metropolis-01:24492] coll:find_available: querying coll component sm
>> >> [Metropolis-01:24492] coll:sm:init_query: no other local procs;
>> disqualifying myself
>> >> [Metropolis-01:24492] coll:find_available: coll component sm is not
>> available
>> >> [Metropolis-01:24492] coll:find_available: querying coll component
>> libnbc
>> >> [Metropolis-01:24492] coll:find_available: coll component libnbc is
>> available
>> >> [Metropolis-01:24492] coll:find_available: querying coll component
>> hierarch
>> >> [Metropolis-01:24492] coll:find_available: coll component hierarch is
>> available
>> >> [Metropolis-01:24492] coll:find_available: querying coll component
>> basic
>> >> [Metropolis-01:24492] coll:find_available: coll component basic is
>> available
>> >> [Metropolis-01:24492] coll:find_available: querying coll component
>> inter
>> >> [Metropolis-01:24492] coll:find_available: coll component inter is
>> available
>> >> [Metropolis-01:24492] coll:find_available: querying coll component self
>> >> [Metropolis-01:24492] coll:find_available: coll component self is
>> available
>> >> [Metropolis-01:24491] coll:find_available: coll component inter is
>> available
>> >> [Metropolis-01:24491] coll:find_available: querying coll component self
>> >> [Metropolis-01:24491] coll:find_available: coll component self is
>> available
>> >> [Metropolis-01:24492] hwloc:base:get_nbojbs computed data 0 of
>> NUMANode:0
>> >> [Metropolis-01:24491] hwloc:base:get_nbojbs computed data 0 of
>> NUMANode:0
>> >> [Metropolis-01:24491] coll:base:comm_select: new communicator:
>> MPI_COMM_WORLD (cid 0)
>> >> [Metropolis-01:24491] coll:base:comm_select: Checking all available
>> modules
>> >> [Metropolis-01:24491] coll:tuned:module_tuned query called
>> >> [Metropolis-01:24491] coll:base:comm_select: component available:
>> tuned, priority: 30
>> >> [Metropolis-01:24491] coll:base:comm_select: component available:
>> libnbc, priority: 10
>> >> [Metropolis-01:24491] coll:base:comm_select: component not available:
>> hierarch
>> >> [Metropolis-01:24491] coll:base:comm_select: component available:
>> basic, priority: 10
>> >> [Metropolis-01:24491] coll:base:comm_select: component not available:
>> inter
>> >> [Metropolis-01:24491] coll:base:comm_select: component not available:
>> self
>> >> [Metropolis-01:24491] coll:tuned:module_init called.
>> >> [Metropolis-01:24491] coll:tuned:module_init Tuned is in use
>> >> [Metropolis-01:24491] coll:base:comm_select: new communicator:
>> MPI_COMM_SELF (cid 1)
>> >> [Metropolis-01:24491] coll:base:comm_select: Checking all available
>> modules
>> >> [Metropolis-01:24491] coll:tuned:module_tuned query called
>> >> [Metropolis-01:24491] coll:base:comm_select: component not available:
>> tuned
>> >> [Metropolis-01:24491] coll:base:comm_select: component available:
>> libnbc, priority: 10
>> >> [Metropolis-01:24491] coll:base:comm_select: component not available:
>> hierarch
>> >> [Metropolis-01:24491] coll:base:comm_select: component available:
>> basic, priority: 10
>> >> [Metropolis-01:24491] coll:base:comm_select: component not available:
>> inter
>> >> [Metropolis-01:24491] coll:base:comm_select: component available:
>> self, priority: 75
>> >> [Metropolis-01:24492] coll:base:comm_select: new communicator:
>> MPI_COMM_WORLD (cid 0)
>> >> [Metropolis-01:24492] coll:base:comm_select: Checking all available
>> modules
>> >> [Metropolis-01:24492] coll:tuned:module_tuned query called
>> >> [Metropolis-01:24492] coll:base:comm_select: component available:
>> tuned, priority: 30
>> >> [Metropolis-01:24492] coll:base:comm_select: component available:
>> libnbc, priority: 10
>> >> [Metropolis-01:24492] coll:base:comm_select: component not available:
>> hierarch
>> >> [Metropolis-01:24492] coll:base:comm_select: component available:
>> basic, priority: 10
>> >> [Metropolis-01:24492] coll:base:comm_select: component not available:
>> inter
>> >> [Metropolis-01:24492] coll:base:comm_select: component not available:
>> self
>> >> [Metropolis-01:24492] coll:tuned:module_init called.
>> >> [Metropolis-01:24492] coll:tuned:module_init Tuned is in use
>> >> [Metropolis-01:24492] coll:base:comm_select: new communicator:
>> MPI_COMM_SELF (cid 1)
>> >> [Metropolis-01:24492] coll:base:comm_select: Checking all available
>> modules
>> >> [Metropolis-01:24492] coll:tuned:module_tuned query called
>> >> [Metropolis-01:24492] coll:base:comm_select: component not available:
>> tuned
>> >> [Metropolis-01:24492] coll:base:comm_select: component available:
>> libnbc, priority: 10
>> >> [Metropolis-01:24492] coll:base:comm_select: component not available:
>> hierarch
>> >> [Metropolis-01:24492] coll:base:comm_select: component available:
>> basic, priority: 10
>> >> [Metropolis-01:24492] coll:base:comm_select: component not available:
>> inter
>> >> [Metropolis-01:24492] coll:base:comm_select: component available:
>> self, priority: 75
>> >> [Metropolis-01:24491] coll:tuned:component_close: called
>> >> [Metropolis-01:24491] coll:tuned:component_close: done!
>> >> [Metropolis-01:24492] coll:tuned:component_close: called
>> >> [Metropolis-01:24492] coll:tuned:component_close: done!
>> >> [Metropolis-01:24492] mca: base: close: component tuned closed
>> >> [Metropolis-01:24492] mca: base: close: unloading component tuned
>> >> [Metropolis-01:24492] mca: base: close: component libnbc closed
>> >> [Metropolis-01:24492] mca: base: close: unloading component libnbc
>> >> [Metropolis-01:24492] mca: base: close: unloading component hierarch
>> >> [Metropolis-01:24492] mca: base: close: unloading component basic
>> >> [Metropolis-01:24492] mca: base: close: unloading component inter
>> >> [Metropolis-01:24492] mca: base: close: unloading component self
>> >> [Metropolis-01:24491] mca: base: close: component tuned closed
>> >> [Metropolis-01:24491] mca: base: close: unloading component tuned
>> >> [Metropolis-01:24491] mca: base: close: component libnbc closed
>> >> [Metropolis-01:24491] mca: base: close: unloading component libnbc
>> >> [Metropolis-01:24491] mca: base: close: unloading component hierarch
>> >> [Metropolis-01:24491] mca: base: close: unloading component basic
>> >> [Metropolis-01:24491] mca: base: close: unloading component inter
>> >> [Metropolis-01:24491] mca: base: close: unloading component self
>> >> [jarico_at_Metropolis-01 examples]$
>> >>
>> >>
>> >> SM is not load because it detects no other processes in the same
>> machine:
>> >>
>> >> [Metropolis-01:24491] coll:sm:init_query: no other local procs;
>> disqualifying myself
>> >>
>> >> The machine is a multicore machine with 8 cores.
>> >>
>> >> I need to run SM component code, and I suppose that raising priority
>> it will be the component selected when problem is solved.
>> >>
>> >>
>> >>
>> >> El 03/07/2012, a las 21:01, Jeff Squyres escribió:
>> >>
>> >>> The issue is that the "sm" coll component only implements a few of
>> the MPI collective operations. It is usually mixed at run-time with other
>> coll components to fill out the rest of the MPI collective operations.
>> >>>
>> >>> So what is happening is that OMPI is determining that it doesn't have
>> implementations of all the MPI collective operations and aborting.
>> >>>
>> >>> You shouldn't need to manually select your coll module -- OMPI should
>> automatically select the right collective module for you. E.g., if all
>> procs are local on a single machine and sm has a matching implementation
>> for that MPI collective operation, it'll be used.
>> >>>
>> >>>
>> >>>
>> >>> On Jul 3, 2012, at 2:48 PM, Juan Antonio Rico Gallego wrote:
>> >>>
>> >>>> Output is:
>> >>>>
>> >>>> [Metropolis-01:15355] hwloc:base:get_topology
>> >>>> [Metropolis-01:15355] hwloc:base: no cpus specified - using root
>> available cpuset
>> >>>>
>> >>>> ======================== JOB MAP ========================
>> >>>>
>> >>>> Data for node: Metropolis-01 Num procs: 2
>> >>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 0
>> >>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 1
>> >>>>
>> >>>> =============================================================
>> >>>> [Metropolis-01:15356] locality: CL:CU:N:B
>> >>>> [Metropolis-01:15356] hwloc:base: get available cpus
>> >>>> [Metropolis-01:15356] hwloc:base:get_available_cpus first time -
>> filtering cpus
>> >>>> [Metropolis-01:15356] hwloc:base: no cpus specified - using root
>> available cpuset
>> >>>> [Metropolis-01:15356] hwloc:base:get_available_cpus root object
>> >>>> [Metropolis-01:15357] locality: CL:CU:N:B
>> >>>> [Metropolis-01:15357] hwloc:base: get available cpus
>> >>>> [Metropolis-01:15357] hwloc:base:get_available_cpus first time -
>> filtering cpus
>> >>>> [Metropolis-01:15357] hwloc:base: no cpus specified - using root
>> available cpuset
>> >>>> [Metropolis-01:15357] hwloc:base:get_available_cpus root object
>> >>>> [Metropolis-01:15356] hwloc:base:get_nbojbs computed data 0 of
>> NUMANode:0
>> >>>> [Metropolis-01:15357] hwloc:base:get_nbojbs computed data 0 of
>> NUMANode:0
>> >>>>
>> >>>>
>> >>>> Regards,
>> >>>> Juan A. Rico
>> >>>> _______________________________________________
>> >>>> devel mailing list
>> >>>> devel_at_[hidden]
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>
>> >>>
>> >>> --
>> >>> Jeff Squyres
>> >>> jsquyres_at_[hidden]
>> >>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> devel mailing list
>> >>> devel_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>