Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-10-01 23:02:31


Hi,

On Monday 01 October 2007 03:08:04 am Hammad Siddiqi wrote:
> One more thing to add -mca mtl mx uses ethernet and IP emulation of
> Myrinet to my knowledge. I want to use Myrinet(not its IP Emulation)
> and shared memory simultaneously.
This is not true (as far as I know...). Open MPI has 2 different network
stacks, and we can use MX with either. See:
http://www.open-mpi.org/faq/?category=myrinet#myri-btl-mx

The mx mtl relies on the MX library for all communications, and the MX library
itself does shared memory message passing. In my experience the mx mtl
performs better than the mx,sm,self btl combination. However, I would
encourage you to try both with your application and would be interested in
hearing your opinion.

<snip>
> > *1. /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
> > "indus1,indus2" -mca btl_base_debug 1000 ./hello*
> >
> > /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl mx,sm,self -host
> > "indus1,indus2,indus3,indus4" -mca btl_base_debug 1000 ./hello
> > [indus1:29331] select: initializing btl component mx
> > [indus1:29331] select: init returned failure
> > [indus1:29331] select: module mx unloaded
<snip>

So it looks like we are trying to load the mx library, but fail for some
reason. Are you sure MX is working correctly? Can you run mx_pingpong between
indus1 and indus2 as described here:
http://www.myri.com/cgi-bin/fom.pl?file=455&keywords=file%253D91

> > *2.1 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host
> > "indus1,indus2,indus3,indus4" ./hello*
> >
> > This command works fine
Since you did not specify to use the cm pml (which MUST be done to use the mx
mtl. see: http://www.open-mpi.org/faq/?category=myrinet#myri-btl-mx), you
were probably actually using tcp for this run since we would automatically
fail back after the mx btl fails to load.

> > *2.2 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host
> > "indus1,indus2,indus3,indus4" -mca pml cm ./hello*
> >
> > This command works fine.
Good. So maybe there isn't anything wrong with your mx setup.

> > Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca pml cm -host
> > "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"*,
> > this command works fine.
Since you selected the cm pml, we should be automatically using the mx mtl
here.

> > but *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca pml cm -host
> > "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"*
> > hangs for indefinite time.
Strange. I do not know why this would hang.

> > Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host
> > "indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"*
> > works fine
Again, you are falling back to using the tcp btl here. BTW, the mtl
string 'mx,sm,self' is bogus. There is no sm or self mtl's.

> >
> > *2.3 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host
> > "indus1,indus2,indus3,indus4" -mca pml cm ./hello*
> >
> > This command hangs the machines for indefinite time.
> > Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host
> > "indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000
> > ./hello"* hangs the
> > systems for indefinite time.
These two commands should have the exact same effect as the hang above.

> >
> > *2.4 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host
> > "indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000
> > ./hello*
> >
> > This command hangs the machines for indefinite time.
Again, the mtl line here is bogus.

> >
> > Please notice that running more than four mpi processes hangs the
> > machines. Any suggestion please.
The first thing I would try is to see if a non-mpi application works. So try:
/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -host "indus1,indus2,indus3,indus4"
hostname

If that works, then try a simple MPI hello application that does no
communication.

Tim

> >>>>>
<snip>
> >>>>>
> >>>>> The output of *mx_info* on each node is given below
> >>>>>
> >>>>> =====*=
> >>>>> indus1
> >>>>> *======
> >>>>>
> >>>>> MX Version: 1.1.7rc3cvs1_1_fixes
> >>>>> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
> >>>>> 2 Myrinet boards installed.
> >>>>> The MX driver is configured to support up to 4 instances and 1024
> >>>>> nodes.
> >>>>> ===================================================================
> >>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >>>>> Status: Running, P0: Link up
> >>>>> MAC Address: 00:60:dd:47:ad:7c
> >>>>> Product code: M3F-PCIXF-2
> >>>>> Part number: 09-03392
> >>>>> Serial number: 297218
> >>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >>>>> Mapped hosts: 10
> >>>>>
> >>>>>
> >>>>> ROUTE COUNT
> >>>>> INDEX MAC ADDRESS HOST NAME P0
> >>>>> ----- -----------
> >>>>> --------- ---
> >>>>> 0) 00:60:dd:47:ad:7c indus1:0 1,1
> >>>>> 2) 00:60:dd:47:ad:68 indus4:0 8,3
> >>>>> 3) 00:60:dd:47:b3:e8 indus4:1 7,3
> >>>>> 4) 00:60:dd:47:b3:ab indus2:0 7,3
> >>>>> 5) 00:60:dd:47:ad:66 indus3:0 8,3
> >>>>> 6) 00:60:dd:47:ad:76 indus3:1 8,3
> >>>>> 7) 00:60:dd:47:ad:77 jhelum1:0 8,3
> >>>>> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
> >>>>> 9) 00:60:dd:47:ad:5f ravi2:1 1,1
> >>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >>>>> ===================================================================
> >>>>>
> >>>>> ======
> >>>>> *indus2*
> >>>>> ======
> >>>>>
> >>>>> MX Version: 1.1.7rc3cvs1_1_fixes
> >>>>> MX Build: @indus2:/opt/mx2g-1.1.7rc3 Thu May 31 11:24:03 PKT 2007
> >>>>> 2 Myrinet boards installed.
> >>>>> The MX driver is configured to support up to 4 instances and 1024
> >>>>> nodes.
> >>>>> ===================================================================
> >>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >>>>> Status: Running, P0: Link up
> >>>>> MAC Address: 00:60:dd:47:b3:ab
> >>>>> Product code: M3F-PCIXF-2
> >>>>> Part number: 09-03392
> >>>>> Serial number: 296636
> >>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >>>>> Mapped hosts: 10
> >>>>>
> >>>>> ROUTE
> >>>>> COUNT
> >>>>> INDEX MAC ADDRESS HOST NAME P0
> >>>>> ----- ----------- --------- ---
> >>>>> 0) 00:60:dd:47:b3:ab indus2:0 1,1
> >>>>> 2) 00:60:dd:47:ad:68 indus4:0 1,1
> >>>>> 3) 00:60:dd:47:b3:e8 indus4:1 8,3
> >>>>> 4) 00:60:dd:47:ad:66 indus3:0 1,1
> >>>>> 5) 00:60:dd:47:ad:76 indus3:1 7,3
> >>>>> 6) 00:60:dd:47:ad:77 jhelum1:0 7,3
> >>>>> 8) 00:60:dd:47:ad:7c indus1:0 8,3
> >>>>> 9) 00:60:dd:47:b3:5a ravi2:0 8,3
> >>>>> 10) 00:60:dd:47:ad:5f ravi2:1 8,3
> >>>>> 11) 00:60:dd:47:b3:bf ravi1:0 7,3
> >>>>> ===================================================================
> >>>>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >>>>> Status: Running, P0: Link down
> >>>>> MAC Address: 00:60:dd:47:b3:c3
> >>>>> Product code: M3F-PCIXF-2
> >>>>> Part number: 09-03392
> >>>>> Serial number: 296612
> >>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >>>>> Mapped hosts: 10
> >>>>>
> >>>>> ======
> >>>>> *indus3*
> >>>>> ======
> >>>>> MX Version: 1.1.7rc3cvs1_1_fixes
> >>>>> MX Build: @indus3:/opt/mx2g-1.1.7rc3 Thu May 31 11:29:03 PKT 2007
> >>>>> 2 Myrinet boards installed.
> >>>>> The MX driver is configured to support up to 4 instances and 1024
> >>>>> nodes.
> >>>>> ===================================================================
> >>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >>>>> Status: Running, P0: Link up
> >>>>> MAC Address: 00:60:dd:47:ad:66
> >>>>> Product code: M3F-PCIXF-2
> >>>>> Part number: 09-03392
> >>>>> Serial number: 297240
> >>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >>>>> Mapped hosts: 10
> >>>>>
> >>>>> ROUTE
> >>>>> COUNT
> >>>>> INDEX MAC ADDRESS HOST NAME P0
> >>>>> ----- ----------- --------- ---
> >>>>> 0) 00:60:dd:47:ad:66 indus3:0 1,1
> >>>>> 1) 00:60:dd:47:ad:76 indus3:1 8,3
> >>>>> 2) 00:60:dd:47:ad:68 indus4:0 1,1
> >>>>> 3) 00:60:dd:47:b3:e8 indus4:1 6,3
> >>>>> 4) 00:60:dd:47:ad:77 jhelum1:0 8,3
> >>>>> 5) 00:60:dd:47:b3:ab indus2:0 1,1
> >>>>> 7) 00:60:dd:47:ad:7c indus1:0 8,3
> >>>>> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
> >>>>> 9) 00:60:dd:47:ad:5f ravi2:1 7,3
> >>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >>>>> ===================================================================
> >>>>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >>>>> Status: Running, P0: Link up
> >>>>> MAC Address: 00:60:dd:47:ad:76
> >>>>> Product code: M3F-PCIXF-2
> >>>>> Part number: 09-03392
> >>>>> Serial number: 297224
> >>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >>>>> Mapped hosts: 10
> >>>>>
> >>>>> ROUTE
> >>>>> COUNT
> >>>>> INDEX MAC ADDRESS HOST NAME P0
> >>>>> ----- ----------- --------- ---
> >>>>> 0) 00:60:dd:47:ad:66 indus3:0 8,3
> >>>>> 1) 00:60:dd:47:ad:76 indus3:1 1,1
> >>>>> 2) 00:60:dd:47:ad:68 indus4:0 7,3
> >>>>> 3) 00:60:dd:47:b3:e8 indus4:1 1,1
> >>>>> 4) 00:60:dd:47:ad:77 jhelum1:0 1,1
> >>>>> 5) 00:60:dd:47:b3:ab indus2:0 7,3
> >>>>> 7) 00:60:dd:47:ad:7c indus1:0 8,3
> >>>>> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
> >>>>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
> >>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >>>>>
> >>>>> ======
> >>>>> *indus4*
> >>>>> ======
> >>>>>
> >>>>> MX Version: 1.1.7rc3cvs1_1_fixes
> >>>>> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
> >>>>> 2 Myrinet boards installed.
> >>>>> The MX driver is configured to support up to 4 instances and 1024
> >>>>> nodes.
> >>>>> ===================================================================
> >>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >>>>> Status: Running, P0: Link up
> >>>>> MAC Address: 00:60:dd:47:ad:68
> >>>>> Product code: M3F-PCIXF-2
> >>>>> Part number: 09-03392
> >>>>> Serial number: 297238
> >>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >>>>> Mapped hosts: 10
> >>>>>
> >>>>> ROUTE
> >>>>> COUNT
> >>>>> INDEX MAC ADDRESS HOST NAME P0
> >>>>> ----- ----------- --------- ---
> >>>>> 0) 00:60:dd:47:ad:68 indus4:0 1,1
> >>>>> 1) 00:60:dd:47:b3:e8 indus4:1 7,3
> >>>>> 2) 00:60:dd:47:ad:77 jhelum1:0 7,3
> >>>>> 3) 00:60:dd:47:ad:66 indus3:0 1,1
> >>>>> 4) 00:60:dd:47:ad:76 indus3:1 7,3
> >>>>> 5) 00:60:dd:47:b3:ab indus2:0 1,1
> >>>>> 7) 00:60:dd:47:ad:7c indus1:0 7,3
> >>>>> 8) 00:60:dd:47:b3:5a ravi2:0 7,3
> >>>>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
> >>>>> 10) 00:60:dd:47:b3:bf ravi1:0 7,3
> >>>>> ===================================================================
> >>>>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >>>>> Status: Running, P0: Link up
> >>>>> MAC Address: 00:60:dd:47:b3:e8
> >>>>> Product code: M3F-PCIXF-2
> >>>>> Part number: 09-03392
> >>>>> Serial number: 296575
> >>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >>>>> Mapped hosts: 10
> >>>>>
> >>>>> ROUTE
> >>>>> COUNT
> >>>>> INDEX MAC ADDRESS HOST NAME P0
> >>>>> ----- ----------- --------- ---
> >>>>> 0) 00:60:dd:47:ad:68 indus4:0 6,3
> >>>>> 1) 00:60:dd:47:b3:e8 indus4:1 1,1
> >>>>> 2) 00:60:dd:47:ad:77 jhelum1:0 1,1
> >>>>> 3) 00:60:dd:47:ad:66 indus3:0 8,3
> >>>>> 4) 00:60:dd:47:ad:76 indus3:1 1,1
> >>>>> 5) 00:60:dd:47:b3:ab indus2:0 8,3
> >>>>> 7) 00:60:dd:47:ad:7c indus1:0 7,3
> >>>>> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
> >>>>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
> >>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >>>>>
> >>>>> The output from *ompi_info* is:
> >>>>>
> >>>>> Open MPI: 1.2.1r14096-ct7b030r1838
> >>>>> Open MPI SVN revision: 0
> >>>>> Open RTE: 1.2.1r14096-ct7b030r1838
> >>>>> Open RTE SVN revision: 0
> >>>>> OPAL: 1.2.1r14096-ct7b030r1838
> >>>>> OPAL SVN revision: 0
> >>>>> Prefix: /opt/SUNWhpc/HPC7.0
> >>>>> Configured architecture: sparc-sun-solaris2.10
> >>>>> Configured by: root
> >>>>> Configured on: Fri Mar 30 12:49:36 EDT 2007
> >>>>> Configure host: burpen-on10-0
> >>>>> Built by: root
> >>>>> Built on: Fri Mar 30 13:10:46 EDT 2007
> >>>>> Built host: burpen-on10-0
> >>>>> C bindings: yes
> >>>>> C++ bindings: yes
> >>>>> Fortran77 bindings: yes (all)
> >>>>> Fortran90 bindings: yes
> >>>>> Fortran90 bindings size: trivial
> >>>>> C compiler: cc
> >>>>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
> >>>>> C++ compiler: CC
> >>>>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
> >>>>> Fortran77 compiler: f77
> >>>>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
> >>>>> Fortran90 compiler: f95
> >>>>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
> >>>>> C profiling: yes
> >>>>> C++ profiling: yes
> >>>>> Fortran77 profiling: yes
> >>>>> Fortran90 profiling: yes
> >>>>> C++ exceptions: yes
> >>>>> Thread support: no
> >>>>> Internal debug support: no
> >>>>> MPI parameter check: runtime
> >>>>> Memory profiling support: no
> >>>>> Memory debugging support: no
> >>>>> libltdl support: yes
> >>>>> Heterogeneous support: yes
> >>>>> mpirun default --prefix: yes
> >>>>> MCA backtrace: printstack (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA timer: solaris (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA
> >>>>> allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll:
> >>>>> basic (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: self (MCA
> >>>>> v1.0, API v1.0, Component v1.2.1) MCA coll: sm (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1) MCA coll: tuned (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1) MCA
> >>>>> mpool: sm (MCA v1.0, API v1.0, Component v1.2.1) MCA mpool: udapl
> >>>>> (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: cm (MCA v1.0, API
> >>>>> v1.0, Component v1.2.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1) MCA
> >>>>> rcache: rb (MCA v1.0, API v1.0, Component v1.2.1) MCA rcache: vma
> >>>>> (MCA v1.0, API v1.0, Component v1.2.1) MCA btl: mx (MCA v1.0, API
> >>>>> v1.0.1, Component v1.2.1) MCA btl: self (MCA v1.0, API v1.0.1,
> >>>>> Component v1.2.1) MCA btl: sm (MCA v1.0, API v1.0.1, Component
> >>>>> v1.2.1) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA btl:
> >>>>> udapl (MCA v1.0, API v1.0, Component v1.2.1) MCA mtl: mx (MCA v1.0,
> >>>>> API v1.0, Component v1.2.1) MCA topo: unity (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1) MCA
> >>>>> errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1) MCA errmgr:
> >>>>> proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA gpr: null (MCA v1.0,
> >>>>> API v1.0, Component v1.2.1) MCA gpr: proxy (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1) MCA gpr: replica (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1) MCA
> >>>>> iof: svc (MCA v1.0, API v1.0, Component v1.2.1) MCA ns: proxy (MCA
> >>>>> v1.0, API v2.0, Component v1.2.1) MCA ns: replica (MCA v1.0, API
> >>>>> v2.0, Component v1.2.1) MCA oob: tcp (MCA v1.0, API v1.0, Component
> >>>>> v1.0) MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
> >>>>> MCA ras: gridengine (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA ras: localhost (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
> >>>>> MCA rds: hostfile (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA
> >>>>> rds: resfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rmaps:
> >>>>> round_robin (MCA v1.0, API v1.3, Component v1.2.1)
> >>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component
> >>>>> v1.2.1) MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1) MCA rml:
> >>>>> oob (MCA v1.0, API v1.0, Component v1.2.1) MCA pls: gridengine (MCA
> >>>>> v1.0, API v1.3, Component v1.2.1)
> >>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1) MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1) MCA pls:
> >>>>> tm (MCA v1.0, API v1.3, Component v1.2.1) MCA sds: env (MCA v1.0, API
> >>>>> v1.0, Component v1.2.1) MCA sds: pipe (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1) MCA sds:
> >>>>> singleton (MCA v1.0, API v1.0, Component v1.2.1)
> >>>>>
<snip>
> >>>>> The output from more */var/run/fms/fma.log*
> >>>>>
> >>>>> Sat Sep 22 10:47:50 2007 NIC 0: M3F-PCIXF-2 s/n=297218 1 ports,
> >>>>> speed=2G Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:7c
> >>>>> Sat Sep 22 10:47:50 2007 NIC 1: M3F-PCIXF-2 s/n=297248 1 ports,
> >>>>> speed=2G Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:5e
> >>>>> Sat Sep 22 10:47:50 2007 fms-1.2.1 fma starting
> >>>>> Sat Sep 22 10:47:50 2007 Mapper was 00:00:00:00:00:00, l=0, is now
> >>>>> 00:60:dd:47:ad:7c, l=1
> >>>>> Sat Sep 22 10:47:50 2007 Mapping fabric...
> >>>>> Sat Sep 22 10:47:54 2007 Mapper was 00:60:dd:47:ad:7c, l=1, is now
> >>>>> 00:60:dd:47:b3:e8, l=1
> >>>>> Sat Sep 22 10:47:54 2007 Cancelling mapping
> >>>>> Sat Sep 22 10:47:59 2007 5 hosts, 8 nics, 6 xbars, 40 links
> >>>>> Sat Sep 22 10:47:59 2007 map version is 1987557551
> >>>>> Sat Sep 22 10:47:59 2007 Found NIC 0 at index 3!
> >>>>> Sat Sep 22 10:47:59 2007 Found NIC 1 at index 2!
> >>>>> Sat Sep 22 10:47:59 2007 map seems OK
> >>>>> Sat Sep 22 10:47:59 2007 Routing took 0 seconds
> >>>>> Mon Sep 24 14:26:46 2007 Requesting remap from indus4
> >>>>> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:5a, lev=1, pkt_type=0
> >>>>> Mon Sep 24 14:26:51 2007 6 hosts, 10 nics, 6 xbars, 42 links
> >>>>> Mon Sep 24 14:26:51 2007 map version is 1987557552
> >>>>> Mon Sep 24 14:26:51 2007 Found NIC 0 at index 3!
> >>>>> Mon Sep 24 14:26:51 2007 Found NIC 1 at index 2!
> >>>>> Mon Sep 24 14:26:51 2007 map seems OK
> >>>>> Mon Sep 24 14:26:51 2007 Routing took 0 seconds
> >>>>> Mon Sep 24 14:35:17 2007 Requesting remap from indus4
> >>>>> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:bf, lev=1, pkt_type=0
> >>>>> Mon Sep 24 14:35:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >>>>> Mon Sep 24 14:35:19 2007 map version is 1987557553
> >>>>> Mon Sep 24 14:35:19 2007 Found NIC 0 at index 5!
> >>>>> Mon Sep 24 14:35:19 2007 Found NIC 1 at index 4!
> >>>>> Mon Sep 24 14:35:19 2007 map seems OK
> >>>>> Mon Sep 24 14:35:19 2007 Routing took 0 seconds
> >>>>> Tue Sep 25 21:47:52 2007 6 hosts, 9 nics, 6 xbars, 41 links
> >>>>> Tue Sep 25 21:47:52 2007 map version is 1987557554
> >>>>> Tue Sep 25 21:47:52 2007 Found NIC 0 at index 3!
> >>>>> Tue Sep 25 21:47:52 2007 Found NIC 1 at index 2!
> >>>>> Tue Sep 25 21:47:52 2007 map seems OK
> >>>>> Tue Sep 25 21:47:52 2007 Routing took 0 seconds
> >>>>> Tue Sep 25 21:52:02 2007 Requesting remap from indus4
> >>>>> (00:60:dd:47:b3:e8): empty port x0p15 is no longer empty
> >>>>> Tue Sep 25 21:52:07 2007 6 hosts, 10 nics, 6 xbars, 42 links
> >>>>> Tue Sep 25 21:52:07 2007 map version is 1987557555
> >>>>> Tue Sep 25 21:52:07 2007 Found NIC 0 at index 4!
> >>>>> Tue Sep 25 21:52:07 2007 Found NIC 1 at index 3!
> >>>>> Tue Sep 25 21:52:07 2007 map seems OK
> >>>>> Tue Sep 25 21:52:07 2007 Routing took 0 seconds
> >>>>> Tue Sep 25 21:52:23 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >>>>> Tue Sep 25 21:52:23 2007 map version is 1987557556
> >>>>> Tue Sep 25 21:52:23 2007 Found NIC 0 at index 6!
> >>>>> Tue Sep 25 21:52:23 2007 Found NIC 1 at index 5!
> >>>>> Tue Sep 25 21:52:23 2007 map seems OK
> >>>>> Tue Sep 25 21:52:23 2007 Routing took 0 seconds
> >>>>> Wed Sep 26 05:07:01 2007 Requesting remap from indus4
> >>>>> (00:60:dd:47:b3:e8): verify failed x1p2, nic 0, port 0 route=-9 4 10
> >>>>> reply=-10 -4 9 , remote=ravi2 NIC
> >>>>> 1, p0 mac=00:60:dd:47:ad:5f
> >>>>> Wed Sep 26 05:07:06 2007 6 hosts, 9 nics, 6 xbars, 41 links
> >>>>> Wed Sep 26 05:07:06 2007 map version is 1987557557
> >>>>> Wed Sep 26 05:07:06 2007 Found NIC 0 at index 3!
> >>>>> Wed Sep 26 05:07:06 2007 Found NIC 1 at index 2!
> >>>>> Wed Sep 26 05:07:06 2007 map seems OK
> >>>>> Wed Sep 26 05:07:06 2007 Routing took 0 seconds
> >>>>> Wed Sep 26 05:11:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >>>>> Wed Sep 26 05:11:19 2007 map version is 1987557558
> >>>>> Wed Sep 26 05:11:19 2007 Found NIC 0 at index 3!
> >>>>> Wed Sep 26 05:11:19 2007 Found NIC 1 at index 2!
> >>>>> Wed Sep 26 05:11:19 2007 map seems OK
> >>>>> Wed Sep 26 05:11:19 2007 Routing took 0 seconds
> >>>>> Thu Sep 27 11:45:37 2007 6 hosts, 9 nics, 6 xbars, 41 links
> >>>>> Thu Sep 27 11:45:37 2007 map version is 1987557559
> >>>>> Thu Sep 27 11:45:37 2007 Found NIC 0 at index 6!
> >>>>> Thu Sep 27 11:45:37 2007 Found NIC 1 at index 5!
> >>>>> Thu Sep 27 11:45:37 2007 map seems OK
> >>>>> Thu Sep 27 11:45:37 2007 Routing took 0 seconds
> >>>>> Thu Sep 27 11:51:02 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >>>>> Thu Sep 27 11:51:02 2007 map version is 1987557560
> >>>>> Thu Sep 27 11:51:02 2007 Found NIC 0 at index 6!
> >>>>> Thu Sep 27 11:51:02 2007 Found NIC 1 at index 5!
> >>>>> Thu Sep 27 11:51:02 2007 map seems OK
> >>>>> Thu Sep 27 11:51:02 2007 Routing took 0 seconds
> >>>>> Fri Sep 28 13:27:10 2007 Requesting remap from indus4
> >>>>> (00:60:dd:47:b3:e8): verify failed x5p0, nic 1, port 0 route=-8 15 6
> >>>>> reply=-6 -15 8 , remote=ravi1 NIC
> >>>>> 0, p0 mac=00:60:dd:47:b3:bf
> >>>>> Fri Sep 28 13:27:24 2007 6 hosts, 8 nics, 6 xbars, 40 links
> >>>>> Fri Sep 28 13:27:24 2007 map version is 1987557561
> >>>>> Fri Sep 28 13:27:24 2007 Found NIC 0 at index 5!
> >>>>> Fri Sep 28 13:27:24 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in
> >>>>> map! Fri Sep 28 13:27:24 2007 map seems OK
> >>>>> Fri Sep 28 13:27:24 2007 Routing took 0 seconds
> >>>>> Fri Sep 28 13:27:44 2007 7 hosts, 10 nics, 6 xbars, 42 links
> >>>>> Fri Sep 28 13:27:44 2007 map version is 1987557562
> >>>>> Fri Sep 28 13:27:44 2007 Found NIC 0 at index 7!
> >>>>> Fri Sep 28 13:27:44 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in
> >>>>> map! Fri Sep 28 13:27:44 2007 map seems OK
> >>>>> Fri Sep 28 13:27:44 2007 Routing took 0 seconds
> >>>>>
> >>>>> Do you have any suggestion or comments why this error appear and
> >>>>> whats the solution to this problem. I have checked community mailing
> >>>>> list for this problem and found few topics related to this, but could
> >>>>> find any solution. Any suggestion or comments will be highly
> >>>>> appreciated.
> >>>>>
> >>>>> The code that i m trying to run is given as follows:
> >>>>>
> >>>>> #include <stdio.h>
> >>>>> #include "mpi.h"
> >>>>> int main(int argc, char **argv)
> >>>>> {
> >>>>> int rank, size, tag, rc, i;
> >>>>> MPI_Status status;
> >>>>> char message[20];
> >>>>> rc = MPI_Init(&argc, &argv);
> >>>>> rc = MPI_Comm_size(MPI_COMM_WORLD, &size);
> >>>>> rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >>>>> tag = 100;
> >>>>> if(rank == 0) {
> >>>>> strcpy(message, "Hello, world");
> >>>>> for (i=1; i<size; i++)
> >>>>> rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD);
> >>>>> }
> >>>>> else
> >>>>> rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
> >>>>> &status);
> >>>>> printf( "node %d : %.13s\n", rank,message);
> >>>>> rc = MPI_Finalize();
> >>>>> return 0;
> >>>>> }