Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-09-29 09:57:12


I would reccommend trying a few things:

1. Set some debugging flags and see if that helps. So, I would try something
like:
/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl
mx,self -host "indus1,indus2" -mca btl_base_debug 1000 ./hello

This will output information as each btl is loaded, and whether or not the
load succeeds.

2. Try running with the mx mtl instead of the btl:
/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" ./hello

Similarly, for debug output:
/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" -mca
mtl_base_debug 1000 ./hello

Let me know if any of these work.

Thanks,

Tim

On Saturday 29 September 2007 01:53:06 am Hammad Siddiqi wrote:
> Hi Terry,
>
> Thanks for replying. The following command is working fine:
>
> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile
> machines ./hello
>
> The contents of machines are:
> indus1
> indus2
> indus3
> indus4
>
> I have tried using np=2 over pairs of machines, but the problem is same.
> The errors that occur are given below with the command that I am trying.
>
> **Test 1**
>
> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
> "indus1,indus2" ./hello
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> **Test 2*
>
> */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
> "indus1,indus3" ./hello
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> *
> *Test 3*
> */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
> "indus1,indus4" ./hello
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> **Test4**
>
> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
> "indus2,indus4" ./hello
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> *
>
> *Test5*
>
> * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
> "indus2,indus3" ./hello
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> **Test 6*
>
> * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
> "indus3,indus4" ./hello
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> **END OF TESTS**
>
> There is one thing to note that when I run this command including -mca
> pml cm it works fine :S
>
> mpirun -np 4 -mca btl mx,sm,self -mca pml cm -machinefile machines ./hello
> Hello MPI! Process 4 of 1 on indus2
> Hello MPI! Process 4 of 2 on indus3
> Hello MPI! Process 4 of 3 on indus4
> Hello MPI! Process 4 of 0 on indus1
>
> To my knowledge this command is not using shared memory and is only
> using myrinet as interconnect.
> One more thing I cannot start more than 4 processes in this case, The
> mpirun process hangs.
>
> Any suggestions?
>
> Once again, thanks for your help.
>
> Regards,
> Hammad
>
> Terry Dontje wrote:
> > Hi Hammad,
> >
> > It looks to me like none of the btl's could resolve a route between the
> > node that process rank 0 is on to the other nodes.
> > I would suggest trying np=2 over a couple pairs of machines to see if
> > that works and you can truly be sure that only the
> > first node is having this problem.
> >
> > It also might be helpful as a sanity check to use the tcp btl instead of
> > mx and see if you get more traction with that.
> >
> > --td
> >
> >> *From:* Hammad Siddiqi (/hammad.siddiqi_at_[hidden]/)
> >> *Date:* 2007-09-28 07:38:01
> >>
> >>
> >>
> >>
> >> Hello,
> >>
> >> I am using Sun HPC Toolkit 7.0 to compile and run my C MPI programs.
> >>
> >> I have tested the myrinet installations using myricoms own test
> >> programs. The Myricom software stack I am using is MX and the vesrion is
> >> mx2g-1.1.7, mx_mapper is also used.
> >> We have 4 nodes having 8 dual core processors each (Sun Fire v890) and
> >> the operating system is
> >> Solaris 10 (SunOS indus1 5.10 Generic_125100-10 sun4u sparc
> >> SUNW,Sun-Fire-V890).
> >>
> >> The contents of machine file are:
> >> indus1
> >> indus2
> >> indus3
> >> indus4
> >>
> >> The output of *mx_info* on each node is given below
> >>
> >> =====*=
> >> indus1
> >> *======
> >>
> >> MX Version: 1.1.7rc3cvs1_1_fixes
> >> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
> >> 2 Myrinet boards installed.
> >> The MX driver is configured to support up to 4 instances and 1024 nodes.
> >> ===================================================================
> >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >> Status: Running, P0: Link up
> >> MAC Address: 00:60:dd:47:ad:7c
> >> Product code: M3F-PCIXF-2
> >> Part number: 09-03392
> >> Serial number: 297218
> >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >> Mapped hosts: 10
> >>
> >>
> >> ROUTE COUNT
> >> INDEX MAC ADDRESS HOST NAME P0
> >> ----- -----------
> >> --------- ---
> >> 0) 00:60:dd:47:ad:7c indus1:0 1,1
> >> 2) 00:60:dd:47:ad:68 indus4:0 8,3
> >> 3) 00:60:dd:47:b3:e8 indus4:1 7,3
> >> 4) 00:60:dd:47:b3:ab indus2:0 7,3
> >> 5) 00:60:dd:47:ad:66 indus3:0 8,3
> >> 6) 00:60:dd:47:ad:76 indus3:1 8,3
> >> 7) 00:60:dd:47:ad:77 jhelum1:0 8,3
> >> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
> >> 9) 00:60:dd:47:ad:5f ravi2:1 1,1
> >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >> ===================================================================
> >>
> >> ======
> >> *indus2*
> >> ======
> >>
> >> MX Version: 1.1.7rc3cvs1_1_fixes
> >> MX Build: @indus2:/opt/mx2g-1.1.7rc3 Thu May 31 11:24:03 PKT 2007
> >> 2 Myrinet boards installed.
> >> The MX driver is configured to support up to 4 instances and 1024 nodes.
> >> ===================================================================
> >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >> Status: Running, P0: Link up
> >> MAC Address: 00:60:dd:47:b3:ab
> >> Product code: M3F-PCIXF-2
> >> Part number: 09-03392
> >> Serial number: 296636
> >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >> Mapped hosts: 10
> >>
> >> ROUTE
> >> COUNT
> >> INDEX MAC ADDRESS HOST NAME P0
> >> ----- ----------- --------- ---
> >> 0) 00:60:dd:47:b3:ab indus2:0 1,1
> >> 2) 00:60:dd:47:ad:68 indus4:0 1,1
> >> 3) 00:60:dd:47:b3:e8 indus4:1 8,3
> >> 4) 00:60:dd:47:ad:66 indus3:0 1,1
> >> 5) 00:60:dd:47:ad:76 indus3:1 7,3
> >> 6) 00:60:dd:47:ad:77 jhelum1:0 7,3
> >> 8) 00:60:dd:47:ad:7c indus1:0 8,3
> >> 9) 00:60:dd:47:b3:5a ravi2:0 8,3
> >> 10) 00:60:dd:47:ad:5f ravi2:1 8,3
> >> 11) 00:60:dd:47:b3:bf ravi1:0 7,3
> >> ===================================================================
> >> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >> Status: Running, P0: Link down
> >> MAC Address: 00:60:dd:47:b3:c3
> >> Product code: M3F-PCIXF-2
> >> Part number: 09-03392
> >> Serial number: 296612
> >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >> Mapped hosts: 10
> >>
> >> ======
> >> *indus3*
> >> ======
> >> MX Version: 1.1.7rc3cvs1_1_fixes
> >> MX Build: @indus3:/opt/mx2g-1.1.7rc3 Thu May 31 11:29:03 PKT 2007
> >> 2 Myrinet boards installed.
> >> The MX driver is configured to support up to 4 instances and 1024 nodes.
> >> ===================================================================
> >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >> Status: Running, P0: Link up
> >> MAC Address: 00:60:dd:47:ad:66
> >> Product code: M3F-PCIXF-2
> >> Part number: 09-03392
> >> Serial number: 297240
> >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >> Mapped hosts: 10
> >>
> >> ROUTE
> >> COUNT
> >> INDEX MAC ADDRESS HOST NAME P0
> >> ----- ----------- --------- ---
> >> 0) 00:60:dd:47:ad:66 indus3:0 1,1
> >> 1) 00:60:dd:47:ad:76 indus3:1 8,3
> >> 2) 00:60:dd:47:ad:68 indus4:0 1,1
> >> 3) 00:60:dd:47:b3:e8 indus4:1 6,3
> >> 4) 00:60:dd:47:ad:77 jhelum1:0 8,3
> >> 5) 00:60:dd:47:b3:ab indus2:0 1,1
> >> 7) 00:60:dd:47:ad:7c indus1:0 8,3
> >> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
> >> 9) 00:60:dd:47:ad:5f ravi2:1 7,3
> >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >> ===================================================================
> >> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >> Status: Running, P0: Link up
> >> MAC Address: 00:60:dd:47:ad:76
> >> Product code: M3F-PCIXF-2
> >> Part number: 09-03392
> >> Serial number: 297224
> >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >> Mapped hosts: 10
> >>
> >> ROUTE
> >> COUNT
> >> INDEX MAC ADDRESS HOST NAME P0
> >> ----- ----------- --------- ---
> >> 0) 00:60:dd:47:ad:66 indus3:0 8,3
> >> 1) 00:60:dd:47:ad:76 indus3:1 1,1
> >> 2) 00:60:dd:47:ad:68 indus4:0 7,3
> >> 3) 00:60:dd:47:b3:e8 indus4:1 1,1
> >> 4) 00:60:dd:47:ad:77 jhelum1:0 1,1
> >> 5) 00:60:dd:47:b3:ab indus2:0 7,3
> >> 7) 00:60:dd:47:ad:7c indus1:0 8,3
> >> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
> >> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
> >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >>
> >> ======
> >> *indus4*
> >> ======
> >>
> >> MX Version: 1.1.7rc3cvs1_1_fixes
> >> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
> >> 2 Myrinet boards installed.
> >> The MX driver is configured to support up to 4 instances and 1024 nodes.
> >> ===================================================================
> >> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >> Status: Running, P0: Link up
> >> MAC Address: 00:60:dd:47:ad:68
> >> Product code: M3F-PCIXF-2
> >> Part number: 09-03392
> >> Serial number: 297238
> >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >> Mapped hosts: 10
> >>
> >> ROUTE
> >> COUNT
> >> INDEX MAC ADDRESS HOST NAME P0
> >> ----- ----------- --------- ---
> >> 0) 00:60:dd:47:ad:68 indus4:0 1,1
> >> 1) 00:60:dd:47:b3:e8 indus4:1 7,3
> >> 2) 00:60:dd:47:ad:77 jhelum1:0 7,3
> >> 3) 00:60:dd:47:ad:66 indus3:0 1,1
> >> 4) 00:60:dd:47:ad:76 indus3:1 7,3
> >> 5) 00:60:dd:47:b3:ab indus2:0 1,1
> >> 7) 00:60:dd:47:ad:7c indus1:0 7,3
> >> 8) 00:60:dd:47:b3:5a ravi2:0 7,3
> >> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
> >> 10) 00:60:dd:47:b3:bf ravi1:0 7,3
> >> ===================================================================
> >> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
> >> Status: Running, P0: Link up
> >> MAC Address: 00:60:dd:47:b3:e8
> >> Product code: M3F-PCIXF-2
> >> Part number: 09-03392
> >> Serial number: 296575
> >> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
> >> Mapped hosts: 10
> >>
> >> ROUTE
> >> COUNT
> >> INDEX MAC ADDRESS HOST NAME P0
> >> ----- ----------- --------- ---
> >> 0) 00:60:dd:47:ad:68 indus4:0 6,3
> >> 1) 00:60:dd:47:b3:e8 indus4:1 1,1
> >> 2) 00:60:dd:47:ad:77 jhelum1:0 1,1
> >> 3) 00:60:dd:47:ad:66 indus3:0 8,3
> >> 4) 00:60:dd:47:ad:76 indus3:1 1,1
> >> 5) 00:60:dd:47:b3:ab indus2:0 8,3
> >> 7) 00:60:dd:47:ad:7c indus1:0 7,3
> >> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
> >> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
> >> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
> >>
> >> The output from *ompi_info* is:
> >>
> >> Open MPI: 1.2.1r14096-ct7b030r1838
> >> Open MPI SVN revision: 0
> >> Open RTE: 1.2.1r14096-ct7b030r1838
> >> Open RTE SVN revision: 0
> >> OPAL: 1.2.1r14096-ct7b030r1838
> >> OPAL SVN revision: 0
> >> Prefix: /opt/SUNWhpc/HPC7.0
> >> Configured architecture: sparc-sun-solaris2.10
> >> Configured by: root
> >> Configured on: Fri Mar 30 12:49:36 EDT 2007
> >> Configure host: burpen-on10-0
> >> Built by: root
> >> Built on: Fri Mar 30 13:10:46 EDT 2007
> >> Built host: burpen-on10-0
> >> C bindings: yes
> >> C++ bindings: yes
> >> Fortran77 bindings: yes (all)
> >> Fortran90 bindings: yes
> >> Fortran90 bindings size: trivial
> >> C compiler: cc
> >> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
> >> C++ compiler: CC
> >> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
> >> Fortran77 compiler: f77
> >> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
> >> Fortran90 compiler: f95
> >> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
> >> C profiling: yes
> >> C++ profiling: yes
> >> Fortran77 profiling: yes
> >> Fortran90 profiling: yes
> >> C++ exceptions: yes
> >> Thread support: no
> >> Internal debug support: no
> >> MPI parameter check: runtime
> >> Memory profiling support: no
> >> Memory debugging support: no
> >> libltdl support: yes
> >> Heterogeneous support: yes
> >> mpirun default --prefix: yes
> >> MCA backtrace: printstack (MCA v1.0, API v1.0, Component
> >> v1.2.1)
> >> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> >> v1.2.1)
> >> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> >> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> >> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2.1)
> >> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
> >> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
> >> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
> >> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
> >> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
> >> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> >> MCA ras: dash_host (MCA v1.0, API v1.3, Component
> >> v1.2.1)
> >> MCA ras: gridengine (MCA v1.0, API v1.3, Component
> >> v1.2.1)
> >> MCA ras: localhost (MCA v1.0, API v1.3, Component
> >> v1.2.1)
> >> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA rds: hostfile (MCA v1.0, API v1.3, Component
> >> v1.2.1) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA rds:
> >> resfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rmaps: round_robin
> >> (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
> >> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
> >> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA pls: gridengine (MCA v1.0, API v1.3, Component
> >> v1.2.1)
> >> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
> >> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
> >> MCA sds: singleton (MCA v1.0, API v1.0, Component
> >> v1.2.1)
> >>
> >> When I try to run a simple hello world program by issuing following
> >> command:
> >>
> >> *mpirun -np 4 -mca btl mx,sm,self -machinefile machines ./hello
> >>
> >> *The following error appears:
> >>
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> >> If you specified the use of a BTL component, you may have
> >> forgotten a component (such as "self") in the list of
> >> usable components.
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> It looks like MPI_INIT failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during MPI_INIT; some of which are due to configuration or
> >> environment
> >> problems. This failure appears to be an internal failure; here's some
> >> additional information (which may only be relevant to an Open MPI
> >> developer):
> >>
> >> PML add procs failed
> >> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> *** An error occurred in MPI_Init
> >> *** before MPI was initialized
> >> *** MPI_ERRORS_ARE_FATAL (goodbye)
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> >> If you specified the use of a BTL component, you may have
> >> forgotten a component (such as "self") in the list of
> >> usable components.
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> It looks like MPI_INIT failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during MPI_INIT; some of which are due to configuration or
> >> environment
> >> problems. This failure appears to be an internal failure; here's some
> >> additional information (which may only be relevant to an Open MPI
> >> developer):
> >>
> >> PML add procs failed
> >> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> *** An error occurred in MPI_Init
> >> *** before MPI was initialized
> >> *** MPI_ERRORS_ARE_FATAL (goodbye)
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
> >> If you specified the use of a BTL component, you may have
> >> forgotten a component (such as "self") in the list of
> >> usable components.
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> It looks like MPI_INIT failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during MPI_INIT; some of which are due to configuration or
> >> environment
> >> problems. This failure appears to be an internal failure; here's some
> >> additional information (which may only be relevant to an Open MPI
> >> developer):
> >>
> >> PML add procs failed
> >> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
> >> If you specified the use of a BTL component, you may have
> >> forgotten a component (such as "self") in the list of
> >> usable components.
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> It looks like MPI_INIT failed for some reason; your parallel process is
> >> likely to abort. There are many reasons that a parallel process can
> >> fail during MPI_INIT; some of which are due to configuration or
> >> environment
> >> problems. This failure appears to be an internal failure; here's some
> >> additional information (which may only be relevant to an Open MPI
> >> developer):
> >>
> >> PML add procs failed
> >> --> Returned "Unreachable" (-*** An error occurred in MPI_Init
> >> *** before MPI was initialized
> >> *** MPI_ERRORS_ARE_FATAL (goodbye)
> >> 12) instead of "Success" (0)
> >> ------------------------------------------------------------------------
> >>--
> >>
> >> *** An error occurred in MPI_Init
> >> *** before MPI was initialized
> >> *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>
> >> The output from more */var/run/fms/fma.log*
> >>
> >> Sat Sep 22 10:47:50 2007 NIC 0: M3F-PCIXF-2 s/n=297218 1 ports, speed=2G
> >> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:7c
> >> Sat Sep 22 10:47:50 2007 NIC 1: M3F-PCIXF-2 s/n=297248 1 ports, speed=2G
> >> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:5e
> >> Sat Sep 22 10:47:50 2007 fms-1.2.1 fma starting
> >> Sat Sep 22 10:47:50 2007 Mapper was 00:00:00:00:00:00, l=0, is now
> >> 00:60:dd:47:ad:7c, l=1
> >> Sat Sep 22 10:47:50 2007 Mapping fabric...
> >> Sat Sep 22 10:47:54 2007 Mapper was 00:60:dd:47:ad:7c, l=1, is now
> >> 00:60:dd:47:b3:e8, l=1
> >> Sat Sep 22 10:47:54 2007 Cancelling mapping
> >> Sat Sep 22 10:47:59 2007 5 hosts, 8 nics, 6 xbars, 40 links
> >> Sat Sep 22 10:47:59 2007 map version is 1987557551
> >> Sat Sep 22 10:47:59 2007 Found NIC 0 at index 3!
> >> Sat Sep 22 10:47:59 2007 Found NIC 1 at index 2!
> >> Sat Sep 22 10:47:59 2007 map seems OK
> >> Sat Sep 22 10:47:59 2007 Routing took 0 seconds
> >> Mon Sep 24 14:26:46 2007 Requesting remap from indus4
> >> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:5a, lev=1, pkt_type=0
> >> Mon Sep 24 14:26:51 2007 6 hosts, 10 nics, 6 xbars, 42 links
> >> Mon Sep 24 14:26:51 2007 map version is 1987557552
> >> Mon Sep 24 14:26:51 2007 Found NIC 0 at index 3!
> >> Mon Sep 24 14:26:51 2007 Found NIC 1 at index 2!
> >> Mon Sep 24 14:26:51 2007 map seems OK
> >> Mon Sep 24 14:26:51 2007 Routing took 0 seconds
> >> Mon Sep 24 14:35:17 2007 Requesting remap from indus4
> >> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:bf, lev=1, pkt_type=0
> >> Mon Sep 24 14:35:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >> Mon Sep 24 14:35:19 2007 map version is 1987557553
> >> Mon Sep 24 14:35:19 2007 Found NIC 0 at index 5!
> >> Mon Sep 24 14:35:19 2007 Found NIC 1 at index 4!
> >> Mon Sep 24 14:35:19 2007 map seems OK
> >> Mon Sep 24 14:35:19 2007 Routing took 0 seconds
> >> Tue Sep 25 21:47:52 2007 6 hosts, 9 nics, 6 xbars, 41 links
> >> Tue Sep 25 21:47:52 2007 map version is 1987557554
> >> Tue Sep 25 21:47:52 2007 Found NIC 0 at index 3!
> >> Tue Sep 25 21:47:52 2007 Found NIC 1 at index 2!
> >> Tue Sep 25 21:47:52 2007 map seems OK
> >> Tue Sep 25 21:47:52 2007 Routing took 0 seconds
> >> Tue Sep 25 21:52:02 2007 Requesting remap from indus4
> >> (00:60:dd:47:b3:e8): empty port x0p15 is no longer empty
> >> Tue Sep 25 21:52:07 2007 6 hosts, 10 nics, 6 xbars, 42 links
> >> Tue Sep 25 21:52:07 2007 map version is 1987557555
> >> Tue Sep 25 21:52:07 2007 Found NIC 0 at index 4!
> >> Tue Sep 25 21:52:07 2007 Found NIC 1 at index 3!
> >> Tue Sep 25 21:52:07 2007 map seems OK
> >> Tue Sep 25 21:52:07 2007 Routing took 0 seconds
> >> Tue Sep 25 21:52:23 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >> Tue Sep 25 21:52:23 2007 map version is 1987557556
> >> Tue Sep 25 21:52:23 2007 Found NIC 0 at index 6!
> >> Tue Sep 25 21:52:23 2007 Found NIC 1 at index 5!
> >> Tue Sep 25 21:52:23 2007 map seems OK
> >> Tue Sep 25 21:52:23 2007 Routing took 0 seconds
> >> Wed Sep 26 05:07:01 2007 Requesting remap from indus4
> >> (00:60:dd:47:b3:e8): verify failed x1p2, nic 0, port 0 route=-9 4 10
> >> reply=-10 -4 9 , remote=ravi2 NIC
> >> 1, p0 mac=00:60:dd:47:ad:5f
> >> Wed Sep 26 05:07:06 2007 6 hosts, 9 nics, 6 xbars, 41 links
> >> Wed Sep 26 05:07:06 2007 map version is 1987557557
> >> Wed Sep 26 05:07:06 2007 Found NIC 0 at index 3!
> >> Wed Sep 26 05:07:06 2007 Found NIC 1 at index 2!
> >> Wed Sep 26 05:07:06 2007 map seems OK
> >> Wed Sep 26 05:07:06 2007 Routing took 0 seconds
> >> Wed Sep 26 05:11:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >> Wed Sep 26 05:11:19 2007 map version is 1987557558
> >> Wed Sep 26 05:11:19 2007 Found NIC 0 at index 3!
> >> Wed Sep 26 05:11:19 2007 Found NIC 1 at index 2!
> >> Wed Sep 26 05:11:19 2007 map seems OK
> >> Wed Sep 26 05:11:19 2007 Routing took 0 seconds
> >> Thu Sep 27 11:45:37 2007 6 hosts, 9 nics, 6 xbars, 41 links
> >> Thu Sep 27 11:45:37 2007 map version is 1987557559
> >> Thu Sep 27 11:45:37 2007 Found NIC 0 at index 6!
> >> Thu Sep 27 11:45:37 2007 Found NIC 1 at index 5!
> >> Thu Sep 27 11:45:37 2007 map seems OK
> >> Thu Sep 27 11:45:37 2007 Routing took 0 seconds
> >> Thu Sep 27 11:51:02 2007 7 hosts, 11 nics, 6 xbars, 43 links
> >> Thu Sep 27 11:51:02 2007 map version is 1987557560
> >> Thu Sep 27 11:51:02 2007 Found NIC 0 at index 6!
> >> Thu Sep 27 11:51:02 2007 Found NIC 1 at index 5!
> >> Thu Sep 27 11:51:02 2007 map seems OK
> >> Thu Sep 27 11:51:02 2007 Routing took 0 seconds
> >> Fri Sep 28 13:27:10 2007 Requesting remap from indus4
> >> (00:60:dd:47:b3:e8): verify failed x5p0, nic 1, port 0 route=-8 15 6
> >> reply=-6 -15 8 , remote=ravi1 NIC
> >> 0, p0 mac=00:60:dd:47:b3:bf
> >> Fri Sep 28 13:27:24 2007 6 hosts, 8 nics, 6 xbars, 40 links
> >> Fri Sep 28 13:27:24 2007 map version is 1987557561
> >> Fri Sep 28 13:27:24 2007 Found NIC 0 at index 5!
> >> Fri Sep 28 13:27:24 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
> >> Fri Sep 28 13:27:24 2007 map seems OK
> >> Fri Sep 28 13:27:24 2007 Routing took 0 seconds
> >> Fri Sep 28 13:27:44 2007 7 hosts, 10 nics, 6 xbars, 42 links
> >> Fri Sep 28 13:27:44 2007 map version is 1987557562
> >> Fri Sep 28 13:27:44 2007 Found NIC 0 at index 7!
> >> Fri Sep 28 13:27:44 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
> >> Fri Sep 28 13:27:44 2007 map seems OK
> >> Fri Sep 28 13:27:44 2007 Routing took 0 seconds
> >>
> >> Do you have any suggestion or comments why this error appear and whats
> >> the solution to this problem. I have checked community mailing list for
> >> this problem and found few topics related to this, but could find any
> >> solution. Any suggestion or comments will be highly appreciated.
> >>
> >> The code that i m trying to run is given as follows:
> >>
> >> #include <stdio.h>
> >> #include "mpi.h"
> >> int main(int argc, char **argv)
> >> {
> >> int rank, size, tag, rc, i;
> >> MPI_Status status;
> >> char message[20];
> >> rc = MPI_Init(&argc, &argv);
> >> rc = MPI_Comm_size(MPI_COMM_WORLD, &size);
> >> rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >> tag = 100;
> >> if(rank == 0) {
> >> strcpy(message, "Hello, world");
> >> for (i=1; i<size; i++)
> >> rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD);
> >> }
> >> else
> >> rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
> >> &status);
> >> printf( "node %d : %.13s\n", rank,message);
> >> rc = MPI_Finalize();
> >> return 0;
> >> }
> >>
> >> Thanks.
> >> Looking forward.
> >> Best regards,
> >> Hammad Siddiqi
> >> Center for High Performance Scientific Computing
> >> NUST Institute of Information Technology,
> >> National University of Sciences and Technology,
> >> Rawalpindi, Pakistan.
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users