Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] Consequence of bind-to-core by default
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-12-19 08:59:54


I notice Absoft's MTT runs are failing due to the change in bind-to-core-by-default:

   http://mtt.open-mpi.org/index.php?do_redir=2136

I asked Tony, who runs the Absoft MTT runs; he confirms that this particular machine has 1 socket with 2 cores (and we're running -np 4 on this machine).

1. This is an unintended consequence of the bind-to-core-by-default policy: we fail with "oversubscribed!" when running on a single machine for test runs like this. Do we like this?

See #3, below, for more on this.

2. Also, the error message that is displayed says:

-----
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node: ltljoe3
   #processes: 2
   #cpus: 1
-----

Which is odd, because the command line is "mpirun -np 4 --mca btl sm,tcp,self ./c_hello". Any idea what's happening here?

3. Finally, we're giving a warning saying:

-----
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
-----

For both #1 and #3, I wonder if we shouldn't be warning if no binding was explicitly stated (i.e., we're just using the defaults). Specifically, if no binding is specified:

- if we oversubscribe, (possibly) warn about the performance loss of oversubscription, and don't bind
- don't warn about lack of memory binding

Thoughts?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/