On 30 Jan 2010, at 14:57, Samuel Thibault wrote:
> Samuel Thibault, le Sat 30 Jan 2010 15:55:00 +0100, a écrit :
>> #21 implicitly does: "what cpuset they're bound to" is just an example.
>> A configuration function hwloc_topology_set_pid(topology, pid) would
>> mean that the discovery has to be done from the view of the given pid,
>> and thus the allowed_cpuset should be according to that view, thus
>> administrative restrictions.
> Just to give an example: lstopo --pid 1234 would not only show where the
> process is currently bound to, but also its allowed cpuset, which can be
> useful when monitoring applications run by a batch scheduler or such.
It was my request that caused Jeff to file that enhancement request. My take on this would be that #21 should be interpreted as 'report system state from the point of view of <pid> rather than self'. I.e. I don't care which cpuset is shown, the current or the allowed, all I care about is changing the frame of reference so the view is what you would see if the same code was being called from <pid>.
The reason for this is it's currently possible to do "mpirun lstopo" to see where processes will be bound but it's not possible using lstopo to see the binding of already running jobs. As some of you will be aware I maintain padb, a 'job inspection' tool and I believe lstopo and padb could work together to present a parallel, job-wide view of process binding across a parallel job.
I've already added the code to padb to wrap around lstopo, it's available from SVN and has been for some time, it currently runs lstopo for every process within a job on the correct node with the --whole-system option, this means the output is not particuarly relevant though - hence the change request.
If you are experimenting with this then the following padb command will allow you to play with the command line options provided, %p will be expanded to the pid. I'm curious to see how this pans out in actual use but I believe it's got potential to be very useful indeed.
$ padb --lstopo -Olstopo-show-warning=no -Olstopo-command="lstopo --pid=%p -" -c [ -a | <jobid> ]
I'm aiming to make a padb release in the next month with a being as RC as soon as two weeks away, if I can change the default "lstopo-command" to one that takes a pid before then that would be great, if not padb is future-proof as users can over-ride the default in a configuration file but this raises the barrier somewhat as people would need to be aware that this was an option.
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing