Hello, Open MPI developer!
Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores.
(4x smaller Bull S6010 coupled by BCS chips to a single image machine)
On a such big box, process pinning is vital.
So we tried to use the Open MPI capabilities to pin te processes. But it
seem that the rankfile infrastructure does not work properly: we always
get "Error: Invalid argument" message on the 128-core node, also if the
rankfile was OK.
On a smaller node (up to 32 cores/ 64 threads) the very same rankfile
(with changed node name of course) works well.
I believe, this computer dimension is a bit too big for the pinning
infrasructure now. A bug?
Best wishes,
Paul Kapinos
P.S. see the attached .tgz for some logzz
------------------------------------------------------------------------------
Rankfiles
Rankfiles provide a means for specifying detailed information
about how process ranks should be mapped to nodes and how they should
be bound. Consider the following:
....
------------------------------------------------------------------------------
Open RTE: 1.5.3
Open RTE SVN revision: r24532
Open RTE release date: Mar 16, 2011
OPAL: 1.5.3
OPAL SVN revision: r24532
OPAL release date: Mar 16, 2011
Ident string: 1.5.3
--
Dipl.-Inform. Paul Kapinos - High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23, D 52074 Aachen (Germany)
Tel: +49 241/80-24915
|