Thank you for your reply, Reese!
> What version of GM are you running?
# rpm -qa |egrep "^gm-[0-9]+|^gm-devel"
Is this too old?
> And are you sure that gm_board_info
> shows all the nodes that are listed in your machine file?
Yes, that was the issue - bad cable connection to my compute node
prevented it from being seen on the fabric :( Thanks for pointing this
out for me.
> Could you send
> a copy of your gm_board_info output , please?
GM build ID is "2.0.24_Linux_rc20051223164441PST
@dr11.myco.com:/usr/src/redhat/BUILD/gm-2.0.24_Linux Tue Jan 30
23:07:45 EST 2007."
Board number 0:
lanai_cpu_version = 0x0a00 (LANai10.0)
lanai_sram_size = 0x001fe000 (2040K bytes)
LANai time is 0x209b211b12 ticks, or about 1043 minutes since reset.
Mapper is 00:60:dd:49:99:96.
Map version is 1965903.
Network is fully configured.
This node is "dr11.myco.com"
Board has room for 16 ports, 1559 nodes/routes, 16384 cache entries
Port token cnt: send=61, recv=253
Port: Status PID
0: BUSY 7489 (this process [gm_board_info])
1: BUSY 25113
Route table for this node follows:
gmID MAC Address gmName Route
---- ----------------- -------------------------------- ---------------------
1 00:60:dd:49:1e:bf dr11.myco.com (this node)
2 00:60:dd:49:99:96 dr05.myco.com 81 (mapper)
> A mismatch between the list
> of nodes actually configured onto the Myrinet fabric and the machine file is
> a common source of errors like this. The mismatch could be caused by cable
> failure or other mapping issues.
Could you elaborate on the mapping issues you mentioned? What are they?
> Why GM instead of MX, by the way?
We have a few MX cards in-house, but no MX switch due to its current
market price. So we're only able to perform MX testing using
direct-connection cables, which is not very exciting :) On the
contrary, we've already had GM boards and a switch and found it
sufficient for OpenMPI testing purposes. Would be great to upgrade to
MX in the near future.
Thank you very much for your help.