Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Reese Faucette (reese_at_[hidden])
Date: 2007-06-01 12:46:43


Just to brainstorm on this a little - the two different clusters will have
different "mapper IDs", and this can be learned via the attached code
snippet. As long as fma is the mapper (as opposed the the older, deprecated
"gm_mapper" or "mx_mapper"), then Myrinet topology rules ensure that NIC 0,
port 0 is all you need to examine. All nodes with the same mapper can then
be considered "on the same fabric"

Except, of course, when you have two fabrics A and B with many nodes each
but only one node in common - then, all will have the same mapper ID, but
are effectively two disjoint fabrics. This is rare, but i have seen it
once.

Perhaps a more general solution is for the MX MTL to look in the MX peer
table for a requested peer (or simply try mx_connect() and notice it
fails?) and report "cannot reach" back up the chain and have higher level
code retry with a different medium on a per-peer basis? This would be
independent of IB or MX or ...

===================================
#include <stdio.h>
#include <stdlib.h>
#include "myriexpress.h"
#include "mx_io.h"

main()
{
  mx_return_t ret;
  mx_endpt_handle_t h;
  mx_mapper_state_t ms;
  int board = 0; /* whichever board you want */

  mx_init();
  ret = mx_open_board(board, &h);
  if (ret != MX_SUCCESS) {
    fprintf(stderr, "Unable to open board %d\n", board);
    exit(1);
  }

  ms.board_number = board;
  ms.iport = 0;
  ret = mx__get_mapper_state(h, &ms);
  if (ret != MX_SUCCESS) {
    fprintf(stderr, "get_mapper_state failed for board %d: %s\n",
        board, mx_strerror(ret));
    exit(1);
  }

  printf("mapper = %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x\n",
         ms.mapper_mac[0] & 0xff, ms.mapper_mac[1] & 0xff,
         ms.mapper_mac[2] & 0xff, ms.mapper_mac[3] & 0xff,
         ms.mapper_mac[4] & 0xff, ms.mapper_mac[5] & 0xff);
  exit(0);
}