Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Reese Faucette (reese_at_[hidden])
Date: 2007-01-02 15:52:17


Ompi failing on mx onlyHi, Gary-
This looks like a config problem, and not a code problem yet. Could you send the output of mx_info from node-1 and from node-2? Also, forgive me counter-asking a possibly dumb OMPI question, but is "-x LD_LIBRARY_PATH" really what you want, as opposed to "-x LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" ? (I would not be surprised if not specifying a value defaults to this behavior, but have to ask).

Also, have you tried MX MTL as opposed to BTL? --mca pml cm --mca mtl mx,self (it looks like you did)

"[node-2:10464] mx_connect fail for node-2:0 with key aaaaffff " makes it look like your fabric may not be fully mapped or that you may have a down link.

thanks,
-reese
Myricom, Inc.

  I was initially using 1.1.2 and moved to 1.2b2 because of a hang on MPI_Bcast() which 1.2b2 reports to fix, and seemed to have done so. My compute nodes are 2 dual core xeons on myrinet with mx. The problem is trying to get ompi running on mx only. My machine file is as follows .

  node-1 slots=4 max-slots=4
  node-2 slots=4 max-slots=4
  node-3 slots=4 max-slots=4

  'mpirun' with the minimum number of processes in order to get the error ...
          mpirun --prefix /usr/local/openmpi-1.2b2 -x LD_LIBRARY_PATH --hostfile ./h1-3 -np 2 --mca btl mx,self ./cpi

  I don't believe there'a anything wrong w/ the hardware as I can ping on mx between this failed node and the master fine. So I tried a different set of 3 nodes and I got the same error, it always fails on the 2nd node of any group of nodes I choose.