Thank you, Gus! I did go through the mpiexec/mpirun man pages but wasn't quite clear that -report-bindings was what I was looking for. So what I did is rerun a program w/ --report-bindings but no bindings were reported.
Scratching my head, I decided to include --bind-to-core as well. Voila, the bindings are reported!
Awesome, but now here is my concern. If we have OpenMPI-based applications launched as batch jobs via a batch scheduler like SLURM, PBS, LSF, etc. (which decides the placement of the app and dispatches it to the compute hosts), then will including "--report-bindings --bind-to-core" cause problems? Certainly I can test this, but concerned there may be a case where inclusion of --bind-to-core would cause an unexpected problem I did not account for.
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Gus Correa
Sent: Thursday, March 27, 2014 2:06 PM
To: Open MPI Users
Subject: Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)
Take a look at the mpiexec/mpirun options:
-report-bindings (this one should report what you want)
and maybe also also:
-bycore, -bysocket, -bind-to-core, -bind-to-socket, ...
and similar, if you want more control on where your MPI processes run.
"man mpiexec" is your friend!
I hope this helps,
On 03/27/2014 01:53 PM, Sasso, John (GE Power & Water, Non-GE) wrote:
> When a piece of software built against OpenMPI fails, I will see an
> error referring to the rank of the MPI task which incurred the failure.
> For example:
> MPI_ABORT was invoked on rank 1236 in communicator MPI_COMM_WORLD
> with errorcode 1.
> Unfortunately, I do not have access to the software code, just the
> installation directory tree for OpenMPI. My question is: Is there a
> flag that can be passed to mpirun, or an environment variable set,
> which would reveal the mapping of ranks to the hosts they are on?
> I do understand that one could have multiple MPI ranks running on the
> same host, but finding a way to determine which rank ran on what host
> would go a long way in help troubleshooting problems which may be
> central to the host. Thanks!
> users mailing list
users mailing list