Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] two questions about 1.7.1
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-06-19 09:26:36


On Jun 19, 2013, at 7:52 AM, Paul Kapinos <kapinos_at_[hidden]> wrote:

> Hello All,
>
> I.
> Using the new Open MPI 1.7.1 we see some messages on the console:
>
> > example mpiext init
> > example mpiext fini
>
> ... on each call to MPI_INIT, MPI_FINALIZE at least in Fortran programs.
>
> Seems somebody forgot to disable some 'printf'-debug-output? =)

This is actually from the mpiext example plugin, not from the Fortran code in OMPI. It's example code, so it has printf's in it. I'm a little surprised to see that output, though -- I wonder if it's somehow getting enabled when it shouldn't be...?

How did you configure/compile Open MPI?

> II.
> In the 1.7.x series, the 'carto' framework has been deleted:
> http://www.open-mpi.org/community/lists/announce/2013/04/0053.php
> > - Removed maffinity, paffinity, and carto frameworks (and associated
> > MCA params).
>
> Is there some replacement for this? Or, would Open MPI detect the NUMA structure of nodes automatically?

Yes. OMPI uses hwloc internally now to figure this stuff out.

> Background: Currently we're using the 'carto' framework on our kinda special 'Bull BCS' nodes. Each such node consist of 4 boards with own IB card but build a shared memory system. Clearly, communicating should go over the nearest IB interface - for this we use 'carto' now.

It should do this automatically in the 1.7 series.

Hmm; I see there isn't any verbose output about which devices it picks, though. :-( Try this patch, and run with --mca btl_base_verbose 100 and see if you see appropriate devices being mapped to appropriate processes:

Index: mca/btl/openib/btl_openib_component.c
===================================================================
--- mca/btl/openib/btl_openib_component.c (revision 28652)
+++ mca/btl/openib/btl_openib_component.c (working copy)
@@ -2712,6 +2712,8 @@
                 mca_btl_openib_component.ib_num_btls <
                 mca_btl_openib_component.ib_max_btls); i++) {
         if (distance != dev_sorted[i].distance) {
+ BTL_VERBOSE(("openib: skipping device %s; it's too far away",
+ ibv_get_device_name(dev_sorted[i].ib_dev)));
             break;
         }
 

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/