Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-06-05 08:50:39


FWIW, we do something similar in the openib BTL -- we use the subnet
ID to determine if two IB ports are connected (we have the rule in
OMPI that physically disconnected subnets must have different ID's --
this is more stringent than the IB spec calls for). See:

http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
http://www.open-mpi.org/faq/?category=openfabrics#ofa-which-subnet-id

On Jun 1, 2007, at 12:46 PM, Reese Faucette wrote:

> Just to brainstorm on this a little - the two different clusters
> will have
> different "mapper IDs", and this can be learned via the attached code
> snippet. As long as fma is the mapper (as opposed the the older,
> deprecated
> "gm_mapper" or "mx_mapper"), then Myrinet topology rules ensure
> that NIC 0,
> port 0 is all you need to examine. All nodes with the same mapper
> can then
> be considered "on the same fabric"
>
> Except, of course, when you have two fabrics A and B with many
> nodes each
> but only one node in common - then, all will have the same mapper
> ID, but
> are effectively two disjoint fabrics. This is rare, but i have
> seen it
> once.
>
> Perhaps a more general solution is for the MX MTL to look in the MX
> peer
> table for a requested peer (or simply try mx_connect() and notice it
> fails?) and report "cannot reach" back up the chain and have higher
> level
> code retry with a different medium on a per-peer basis? This would be
> independent of IB or MX or ...
>
> ===================================
> #include <stdio.h>
> #include <stdlib.h>
> #include "myriexpress.h"
> #include "mx_io.h"
>
> main()
> {
> mx_return_t ret;
> mx_endpt_handle_t h;
> mx_mapper_state_t ms;
> int board = 0; /* whichever board you want */
>
> mx_init();
> ret = mx_open_board(board, &h);
> if (ret != MX_SUCCESS) {
> fprintf(stderr, "Unable to open board %d\n", board);
> exit(1);
> }
>
> ms.board_number = board;
> ms.iport = 0;
> ret = mx__get_mapper_state(h, &ms);
> if (ret != MX_SUCCESS) {
> fprintf(stderr, "get_mapper_state failed for board %d: %s\n",
> board, mx_strerror(ret));
> exit(1);
> }
>
> printf("mapper = %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x\n",
> ms.mapper_mac[0] & 0xff, ms.mapper_mac[1] & 0xff,
> ms.mapper_mac[2] & 0xff, ms.mapper_mac[3] & 0xff,
> ms.mapper_mac[4] & 0xff, ms.mapper_mac[5] & 0xff);
> exit(0);
> }
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems