Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Kees Verstoep (versto_at_[hidden])
Date: 2007-06-11 11:25:06


George Bosilca wrote:

> A fix for this problem is now available on the trunk. Please use any
> revision after 14963 and your problem will vanish [I hope!]. There are
> now some additional parameters which allow you to select which Myrinet
> network you want to use in the case there are several available (--mca
> btl_mx_if_include and --mca btl_mx_if_exclude). Even multi-rails should
> now work over MX.

I have tried nightly snapshot openmpi-1.3a1r14981 and it (almost)
seems to work. The version as is, when run in combination with
MX-1.2.0j and the FMA mapper, currently results in the following
error on each node:

mx_get_info(MX_LINE_SPEED) failed with status 35 (Bad info length)

However, with the small patch below, multi-cluster jobs indeed seem
to be running fine (using MX locally). I'll do some more testing
later this week.

Thanks a lot for the fix!
Kees

*** ./ompi/mca/btl/mx/btl_mx_component.c.orig 2007-06-11 17:12:11.000000000 +0200
--- ./ompi/mca/btl/mx/btl_mx_component.c 2007-06-11 17:13:34.000000000 +0200
***************
*** 310,316 ****
   #if defined(MX_HAS_NET_TYPE)
       {
           int value;
! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, NULL, 0,
                                      &value, sizeof(int))) != MX_SUCCESS ) {
               opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d
(%s)\n",
                            status, mx_strerror(status) );
--- 310,317 ----
   #if defined(MX_HAS_NET_TYPE)
       {
           int value;
! if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED,
! &nic_id, sizeof(nic_id),
                                      &value, sizeof(int))) != MX_SUCCESS ) {
               opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d
(%s)\n",
                            status, mx_strerror(status) );