Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Consequence of bind-to-core by default
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-12-19 08:59:54


I notice Absoft's MTT runs are failing due to the change in bind-to-core-by-default:

   http://mtt.open-mpi.org/index.php?do_redir=2136

I asked Tony, who runs the Absoft MTT runs; he confirms that this particular machine has 1 socket with 2 cores (and we're running -np 4 on this machine).

1. This is an unintended consequence of the bind-to-core-by-default policy: we fail with "oversubscribed!" when running on a single machine for test runs like this. Do we like this?

See #3, below, for more on this.

2. Also, the error message that is displayed says:

-----
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node: ltljoe3
   #processes: 2
   #cpus: 1
-----

Which is odd, because the command line is "mpirun -np 4 --mca btl sm,tcp,self ./c_hello". Any idea what's happening here?

3. Finally, we're giving a warning saying:

-----
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
-----

For both #1 and #3, I wonder if we shouldn't be warning if no binding was explicitly stated (i.e., we're just using the defaults). Specifically, if no binding is specified:

- if we oversubscribe, (possibly) warn about the performance loss of oversubscription, and don't bind
- don't warn about lack of memory binding

Thoughts?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/