Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] rankfiles on really big nodes broken?
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2012-01-20 14:43:53

Hello, Open MPI developer!

Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores.
(4x smaller Bull S6010 coupled by BCS chips to a single image machine)

On a such big box, process pinning is vital.

So we tried to use the Open MPI capabilities to pin te processes. But it
seem that the rankfile infrastructure does not work properly: we always
get "Error: Invalid argument" message on the 128-core node, also if the
rankfile was OK.
On a smaller node (up to 32 cores/ 64 threads) the very same rankfile
(with changed node name of course) works well.

I believe, this computer dimension is a bit too big for the pinning
infrasructure now. A bug?

Best wishes,

Paul Kapinos

P.S. see the attached .tgz for some logzz

        Rankfiles provide a means for specifying detailed information
about how process ranks should be mapped to nodes and how they should
be bound. Consider the following:
                 Open RTE: 1.5.3
    Open RTE SVN revision: r24532
    Open RTE release date: Mar 16, 2011
                     OPAL: 1.5.3
        OPAL SVN revision: r24532
        OPAL release date: Mar 16, 2011
             Ident string: 1.5.3

Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915