Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SC13 birds of a feather
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-12-04 07:49:03


FWIW: ORTE already has a sensor framework in it that reads some of these
things, so adding the coretemp etc is pretty trivial. These readings can be
taken in the ORTE event thread on daemons, but we could allow procs to do
so as well (if the app requests it), or can make it driven via the MPI_T
function. If we use the event engine, we could have ORTE push those values
into the internal database, and then provide an MPI_T access to retrieve
them.

I'm working on the monitoring section over the next few weeks and can add
the data collection part. If Jeff or someone can point me to the required
MPI_T "glue", I can add that too.

On Wed, Dec 4, 2013 at 4:23 AM, Chris Samuel <samuel_at_[hidden]> wrote:

> On Wed, 4 Dec 2013 11:39:29 AM Jeff Squyres wrote:
>
> > On Dec 3, 2013, at 7:54 PM,
> > Christopher Samuel <samuel_at_[hidden]> wrote:
> >
> > > Would it make any sense to expose system/environmental/thermal
> > > information to the application via MPI_T ?
> >
> > Hmm. Interesting idea.
>
> Phew. :-)
>
> > Is the best way to grab such stuff via IPMI?
>
> I don't think so, that means either having the process have permissions to
> access /dev/ipmi* or needing to talk over the network to the adapter,
> neither
> of which are likely to be desirable (or even possible, our iDataplex IMMs
> are
> not accessible from the compute nodes).
>
> However, using the coretemp kernel module means you get access to at least
> information about CPU temperatures on Intel systems:
>
> /sys/bus/platform/devices/coretemp.${A}/temp${B}_input
>
> which contains the core temperature in 100ths of a degree Celsius and are
> world readable. You also get access to the various thermal trip points and
> alarms.
>
> The ${B} value is 1 for the CPU package (SandyBridge or later only), then
> sequentially for the physical cores. ${A} is 0 for the first socket, then
> max($B of $A)+1 for the next socket, etc..
>
> So on the test login node of our 2010 era Nehalem iDataplex you get a file
> per
> CPU core but nothing for the socket, viz:
>
> [root_at_merri-test ~]# ls /sys/bus/platform/devices/coretemp.*/*input*
> /sys/bus/platform/devices/coretemp.0/temp2_input
> /sys/bus/platform/devices/coretemp.0/temp3_input
> /sys/bus/platform/devices/coretemp.0/temp4_input
> /sys/bus/platform/devices/coretemp.0/temp5_input
> /sys/bus/platform/devices/coretemp.4/temp2_input
> /sys/bus/platform/devices/coretemp.4/temp3_input
> /sys/bus/platform/devices/coretemp.4/temp4_input
> /sys/bus/platform/devices/coretemp.4/temp5_input
>
> [root_at_merri-test ~]# cat /sys/bus/platform/devices/coretemp.*/*input*
> 52000
> 52000
> 52000
> 53000
> 59000
> 55000
> 58000
> 56000
>
> On the test login node of our SandyBridge iDataplex delivered mid year we
> get
> the package as well:
>
> [root_at_barcoo-test ~]# ls /sys/bus/platform/devices/coretemp.*/*input*
> /sys/bus/platform/devices/coretemp.0/temp1_input
> /sys/bus/platform/devices/coretemp.0/temp2_input
> /sys/bus/platform/devices/coretemp.0/temp3_input
> /sys/bus/platform/devices/coretemp.0/temp4_input
> /sys/bus/platform/devices/coretemp.0/temp5_input
> /sys/bus/platform/devices/coretemp.0/temp6_input
> /sys/bus/platform/devices/coretemp.0/temp7_input
> /sys/bus/platform/devices/coretemp.6/temp1_input
> /sys/bus/platform/devices/coretemp.6/temp2_input
> /sys/bus/platform/devices/coretemp.6/temp3_input
> /sys/bus/platform/devices/coretemp.6/temp4_input
> /sys/bus/platform/devices/coretemp.6/temp5_input
> /sys/bus/platform/devices/coretemp.6/temp6_input
> /sys/bus/platform/devices/coretemp.6/temp7_input
>
> [root_at_barcoo-test ~]# cat /sys/bus/platform/devices/coretemp.*/*input*
> 44000
> 43000
> 44000
> 42000
> 43000
> 38000
> 44000
> 37000
> 33000
> 37000
> 32000
> 34000
> 36000
> 33000
>
> There's more information in $KERNEL_SOURCE/Documentation/hwmon/coretemp.
>
> Both those systems are running RHEL6, so it should be fairly well supported
> *if* the sysadmin has loaded the modules.
>
> > That might well be do-able, since there's no performance penalty for
> reading
> > such values until you actually read the values (i.e., we don't actively
> > monitor these values in OMPI's overall progression engine; they're only
> > read when the application invokes an MPI_T read function).
>
> Indeed, these *shouldn't* hang trying to read them. ;-)
>
> cheers,
> Chris
> --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>