Hi Matt,
It seems that the right way to do it is the fallowing:
-mca btl openib,self -mca btl_openib_ib_pkey_val 33033
when the value is a decimal number of the pkey, in your case 0x8109 = 33033, and no need for btl_openib_ib_pkey_ix value.
ex.
mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
LT (2) (size min max avg) 1 3.511429 3.511429 3.511429if it's not working check cat /sys/class/infiniband/mthca0/ports/1/pkeys/* for pkeys ans SM, maybe it's a setup.
Pasha is currently checking this issue.
Best regards,
Lenny.
On 10/7/08, Jeff Squyres <jsquyres@cisco.com> wrote:FWIW, if this configuration is for all of your users, you might want to specify these MCA params in the default MCA param file, or the environment, ...etc. Just so that you don't have to specify it on every mpirun command line.
See http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.--
On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:
Sorry, misunderstood the question,
thanks for Pasha the right command line will be
-mca btl openib,self -mca btl_openib_of_pkey_val 0x8109 -mca btl_openib_of_pkey_ix 1
ex.
#mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca btl_openib_of_pkey_val 0x8001 -mca btl_openib_of_pkey_ix 1 ./mpi_p1_4_TRUNK -t lt
LT (2) (size min max avg) 1 3.443480 3.443480 3.443480
Best regards
Lenny.
On 10/6/08, Jeff Squyres <jsquyres@cisco.com> wrote: On Oct 5, 2008, at 1:22 PM, Lenny Verkhovsky wrote:
you should probably use -mca tcp,self -mca btl_openib_if_include ib0.8109
Really? I thought we only took OpenFabrics device names in the openib_if_include MCA param...? It looks like ib0.8109 is an IPoIB device name.
Lenny._______________________________________________
On 10/3/08, Matt Burgess <burgess.matt@gmail.com> wrote:
Hi,
I'm trying to get openmpi working over openib partitions. On this cluster, the partition number is 0x109. The ib interfaces are pingable over the appropriate ib0.8109 interface:
d2:/opt/openmpi-ib # ifconfig ib0.8109
ib0.8109 Link encap:UNSPEC HWaddr 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.21.48.2 Bcast:10.21.255.255 Mask:255.255.0.0
inet6 addr: fe80::202:c902:26:ca01/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:16811 errors:0 dropped:0 overruns:0 frame:0
TX packets:15848 errors:0 dropped:1 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:102229428 (97.4 Mb) TX bytes:102324172 (97.5 Mb)
I have tried the following:
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile machinefile -mca btl openib,self -mca btl_openib_max_btls 1 -mca btl_openib_ib_pkey_val 0x8109 -mca btl_openib_ib_pkey_ix 1 /cluster/pallas/x86_64-ib/IMB-MPI1
but I just get a RETRY EXCEEDED ERROR. Is there a MCA parameter I am missing?
I was successful using tcp only:
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile machinefile -mca btl tcp,self -mca btl_openib_max_btls 1 -mca btl_openib_ib_pkey_val 0x8109 /cluster/pallas/x86_64-ib/IMB-MPI1
Thanks,
Matt Burgess
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
Jeff Squyres
Cisco Systems