Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Mapping ranks to hosts (from MPI error messages)
From: Joe Landman (landman_at_[hidden])
Date: 2014-03-27 14:04:07


On 03/27/2014 01:53 PM, Sasso, John (GE Power & Water, Non-GE) wrote:
> When a piece of software built against OpenMPI fails, I will see an
> error referring to the rank of the MPI task which incurred the failure.
> For example:
>
> MPI_ABORT was invoked on rank 1236 in communicator MPI_COMM_WORLD
>
> with errorcode 1.
>
> Unfortunately, I do not have access to the software code, just the
> installation directory tree for OpenMPI. My question is: Is there a
> flag that can be passed to mpirun, or an environment variable set, which
> would reveal the mapping of ranks to the hosts they are on?
>
> I do understand that one could have multiple MPI ranks running on the
> same host, but finding a way to determine which rank ran on what host
> would go a long way in help troubleshooting problems which may be
> central to the host. Thanks!

In the past, I've done something like this (in C, though a similar thing
would work well in Fortran/others)

#include <sys/utsname.h>
/* ... */
int debug = 1;
char *cpu_name;
struct utsname uts;

/* ... later, after MPI_Init/MPI_Comm_rank/MPI_Comm_size .. */

uname(&uts);
cpu_name = uts.nodename;

if (debug==1) {
        printf("hostname=%s, I am rank %d\n", cpu_name,rank);
}

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman_at_[hidden]
web  : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615