Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)
From: Sasso, John (GE Power & Water, Non-GE) (John1.Sasso_at_[hidden])
Date: 2014-03-27 14:41:39


Thank you, Gus! I did go through the mpiexec/mpirun man pages but wasn't quite clear that -report-bindings was what I was looking for. So what I did is rerun a program w/ --report-bindings but no bindings were reported.

Scratching my head, I decided to include --bind-to-core as well. Voila, the bindings are reported!

Awesome, but now here is my concern. If we have OpenMPI-based applications launched as batch jobs via a batch scheduler like SLURM, PBS, LSF, etc. (which decides the placement of the app and dispatches it to the compute hosts), then will including "--report-bindings --bind-to-core" cause problems? Certainly I can test this, but concerned there may be a case where inclusion of --bind-to-core would cause an unexpected problem I did not account for.

--john

-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Gus Correa
Sent: Thursday, March 27, 2014 2:06 PM
To: Open MPI Users
Subject: Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)

Hi John

Take a look at the mpiexec/mpirun options:

-report-bindings (this one should report what you want)

and maybe also also:

-bycore, -bysocket, -bind-to-core, -bind-to-socket, ...

and similar, if you want more control on where your MPI processes run.

"man mpiexec" is your friend!

I hope this helps,
Gus Correa

On 03/27/2014 01:53 PM, Sasso, John (GE Power & Water, Non-GE) wrote:
> When a piece of software built against OpenMPI fails, I will see an
> error referring to the rank of the MPI task which incurred the failure.
> For example:
>
> MPI_ABORT was invoked on rank 1236 in communicator MPI_COMM_WORLD
>
> with errorcode 1.
>
> Unfortunately, I do not have access to the software code, just the
> installation directory tree for OpenMPI. My question is: Is there a
> flag that can be passed to mpirun, or an environment variable set,
> which would reveal the mapping of ranks to the hosts they are on?
>
> I do understand that one could have multiple MPI ranks running on the
> same host, but finding a way to determine which rank ran on what host
> would go a long way in help troubleshooting problems which may be
> central to the host. Thanks!
>
> --john
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users