Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Hammad Siddiqi (hammad.siddiqi_at_[hidden])
Date: 2007-09-29 01:53:06


Hi Terry,

Thanks for replying. The following command is working fine:

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile
machines ./hello

The contents of machines are:
indus1
indus2
indus3
indus4

I have tried using np=2 over pairs of machines, but the problem is same.
The errors that occur are given below with the command that I am trying.

**Test 1**

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
"indus1,indus2" ./hello
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**Test 2*

*/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
"indus1,indus3" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
*
*Test 3*
*/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
"indus1,indus4" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**Test4**

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
"indus2,indus4" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
*

*Test5*

* /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
"indus2,indus3" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**Test 6*

* /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
"indus3,indus4" ./hello
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

**END OF TESTS**

There is one thing to note that when I run this command including -mca
pml cm it works fine :S

mpirun -np 4 -mca btl mx,sm,self -mca pml cm -machinefile machines ./hello
Hello MPI! Process 4 of 1 on indus2
Hello MPI! Process 4 of 2 on indus3
Hello MPI! Process 4 of 3 on indus4
Hello MPI! Process 4 of 0 on indus1

To my knowledge this command is not using shared memory and is only
using myrinet as interconnect.
One more thing I cannot start more than 4 processes in this case, The
mpirun process hangs.

Any suggestions?

Once again, thanks for your help.

Regards,
Hammad

Terry Dontje wrote:
> Hi Hammad,
>
> It looks to me like none of the btl's could resolve a route between the
> node that process rank 0 is on to the other nodes.
> I would suggest trying np=2 over a couple pairs of machines to see if
> that works and you can truly be sure that only the
> first node is having this problem.
>
> It also might be helpful as a sanity check to use the tcp btl instead of
> mx and see if you get more traction with that.
>
> --td
>
>
>> *From:* Hammad Siddiqi (/hammad.siddiqi_at_[hidden]/)
>> *Date:* 2007-09-28 07:38:01
>>
>>
>
>
>> Hello,
>>
>> I am using Sun HPC Toolkit 7.0 to compile and run my C MPI programs.
>>
>> I have tested the myrinet installations using myricoms own test programs.
>> The Myricom software stack I am using is MX and the vesrion is
>> mx2g-1.1.7, mx_mapper is also used.
>> We have 4 nodes having 8 dual core processors each (Sun Fire v890) and
>> the operating system is
>> Solaris 10 (SunOS indus1 5.10 Generic_125100-10 sun4u sparc
>> SUNW,Sun-Fire-V890).
>>
>> The contents of machine file are:
>> indus1
>> indus2
>> indus3
>> indus4
>>
>> The output of *mx_info* on each node is given below
>>
>> =====*=
>> indus1
>> *======
>>
>> MX Version: 1.1.7rc3cvs1_1_fixes
>> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
>> 2 Myrinet boards installed.
>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>> ===================================================================
>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>> Status: Running, P0: Link up
>> MAC Address: 00:60:dd:47:ad:7c
>> Product code: M3F-PCIXF-2
>> Part number: 09-03392
>> Serial number: 297218
>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>> Mapped hosts: 10
>>
>>
>> ROUTE COUNT
>> INDEX MAC ADDRESS HOST NAME P0
>> ----- -----------
>> --------- ---
>> 0) 00:60:dd:47:ad:7c indus1:0 1,1
>> 2) 00:60:dd:47:ad:68 indus4:0 8,3
>> 3) 00:60:dd:47:b3:e8 indus4:1 7,3
>> 4) 00:60:dd:47:b3:ab indus2:0 7,3
>> 5) 00:60:dd:47:ad:66 indus3:0 8,3
>> 6) 00:60:dd:47:ad:76 indus3:1 8,3
>> 7) 00:60:dd:47:ad:77 jhelum1:0 8,3
>> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
>> 9) 00:60:dd:47:ad:5f ravi2:1 1,1
>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>> ===================================================================
>>
>> ======
>> *indus2*
>> ======
>>
>> MX Version: 1.1.7rc3cvs1_1_fixes
>> MX Build: @indus2:/opt/mx2g-1.1.7rc3 Thu May 31 11:24:03 PKT 2007
>> 2 Myrinet boards installed.
>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>> ===================================================================
>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>> Status: Running, P0: Link up
>> MAC Address: 00:60:dd:47:b3:ab
>> Product code: M3F-PCIXF-2
>> Part number: 09-03392
>> Serial number: 296636
>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>> Mapped hosts: 10
>>
>> ROUTE
>> COUNT
>> INDEX MAC ADDRESS HOST NAME P0
>> ----- ----------- --------- ---
>> 0) 00:60:dd:47:b3:ab indus2:0 1,1
>> 2) 00:60:dd:47:ad:68 indus4:0 1,1
>> 3) 00:60:dd:47:b3:e8 indus4:1 8,3
>> 4) 00:60:dd:47:ad:66 indus3:0 1,1
>> 5) 00:60:dd:47:ad:76 indus3:1 7,3
>> 6) 00:60:dd:47:ad:77 jhelum1:0 7,3
>> 8) 00:60:dd:47:ad:7c indus1:0 8,3
>> 9) 00:60:dd:47:b3:5a ravi2:0 8,3
>> 10) 00:60:dd:47:ad:5f ravi2:1 8,3
>> 11) 00:60:dd:47:b3:bf ravi1:0 7,3
>> ===================================================================
>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>> Status: Running, P0: Link down
>> MAC Address: 00:60:dd:47:b3:c3
>> Product code: M3F-PCIXF-2
>> Part number: 09-03392
>> Serial number: 296612
>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>> Mapped hosts: 10
>>
>> ======
>> *indus3*
>> ======
>> MX Version: 1.1.7rc3cvs1_1_fixes
>> MX Build: @indus3:/opt/mx2g-1.1.7rc3 Thu May 31 11:29:03 PKT 2007
>> 2 Myrinet boards installed.
>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>> ===================================================================
>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>> Status: Running, P0: Link up
>> MAC Address: 00:60:dd:47:ad:66
>> Product code: M3F-PCIXF-2
>> Part number: 09-03392
>> Serial number: 297240
>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>> Mapped hosts: 10
>>
>> ROUTE
>> COUNT
>> INDEX MAC ADDRESS HOST NAME P0
>> ----- ----------- --------- ---
>> 0) 00:60:dd:47:ad:66 indus3:0 1,1
>> 1) 00:60:dd:47:ad:76 indus3:1 8,3
>> 2) 00:60:dd:47:ad:68 indus4:0 1,1
>> 3) 00:60:dd:47:b3:e8 indus4:1 6,3
>> 4) 00:60:dd:47:ad:77 jhelum1:0 8,3
>> 5) 00:60:dd:47:b3:ab indus2:0 1,1
>> 7) 00:60:dd:47:ad:7c indus1:0 8,3
>> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
>> 9) 00:60:dd:47:ad:5f ravi2:1 7,3
>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>> ===================================================================
>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>> Status: Running, P0: Link up
>> MAC Address: 00:60:dd:47:ad:76
>> Product code: M3F-PCIXF-2
>> Part number: 09-03392
>> Serial number: 297224
>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>> Mapped hosts: 10
>>
>> ROUTE
>> COUNT
>> INDEX MAC ADDRESS HOST NAME P0
>> ----- ----------- --------- ---
>> 0) 00:60:dd:47:ad:66 indus3:0 8,3
>> 1) 00:60:dd:47:ad:76 indus3:1 1,1
>> 2) 00:60:dd:47:ad:68 indus4:0 7,3
>> 3) 00:60:dd:47:b3:e8 indus4:1 1,1
>> 4) 00:60:dd:47:ad:77 jhelum1:0 1,1
>> 5) 00:60:dd:47:b3:ab indus2:0 7,3
>> 7) 00:60:dd:47:ad:7c indus1:0 8,3
>> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>>
>> ======
>> *indus4*
>> ======
>>
>> MX Version: 1.1.7rc3cvs1_1_fixes
>> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
>> 2 Myrinet boards installed.
>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>> ===================================================================
>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>> Status: Running, P0: Link up
>> MAC Address: 00:60:dd:47:ad:68
>> Product code: M3F-PCIXF-2
>> Part number: 09-03392
>> Serial number: 297238
>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>> Mapped hosts: 10
>>
>> ROUTE
>> COUNT
>> INDEX MAC ADDRESS HOST NAME P0
>> ----- ----------- --------- ---
>> 0) 00:60:dd:47:ad:68 indus4:0 1,1
>> 1) 00:60:dd:47:b3:e8 indus4:1 7,3
>> 2) 00:60:dd:47:ad:77 jhelum1:0 7,3
>> 3) 00:60:dd:47:ad:66 indus3:0 1,1
>> 4) 00:60:dd:47:ad:76 indus3:1 7,3
>> 5) 00:60:dd:47:b3:ab indus2:0 1,1
>> 7) 00:60:dd:47:ad:7c indus1:0 7,3
>> 8) 00:60:dd:47:b3:5a ravi2:0 7,3
>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
>> 10) 00:60:dd:47:b3:bf ravi1:0 7,3
>> ===================================================================
>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>> Status: Running, P0: Link up
>> MAC Address: 00:60:dd:47:b3:e8
>> Product code: M3F-PCIXF-2
>> Part number: 09-03392
>> Serial number: 296575
>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>> Mapped hosts: 10
>>
>> ROUTE
>> COUNT
>> INDEX MAC ADDRESS HOST NAME P0
>> ----- ----------- --------- ---
>> 0) 00:60:dd:47:ad:68 indus4:0 6,3
>> 1) 00:60:dd:47:b3:e8 indus4:1 1,1
>> 2) 00:60:dd:47:ad:77 jhelum1:0 1,1
>> 3) 00:60:dd:47:ad:66 indus3:0 8,3
>> 4) 00:60:dd:47:ad:76 indus3:1 1,1
>> 5) 00:60:dd:47:b3:ab indus2:0 8,3
>> 7) 00:60:dd:47:ad:7c indus1:0 7,3
>> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>>
>> The output from *ompi_info* is:
>>
>> Open MPI: 1.2.1r14096-ct7b030r1838
>> Open MPI SVN revision: 0
>> Open RTE: 1.2.1r14096-ct7b030r1838
>> Open RTE SVN revision: 0
>> OPAL: 1.2.1r14096-ct7b030r1838
>> OPAL SVN revision: 0
>> Prefix: /opt/SUNWhpc/HPC7.0
>> Configured architecture: sparc-sun-solaris2.10
>> Configured by: root
>> Configured on: Fri Mar 30 12:49:36 EDT 2007
>> Configure host: burpen-on10-0
>> Built by: root
>> Built on: Fri Mar 30 13:10:46 EDT 2007
>> Built host: burpen-on10-0
>> C bindings: yes
>> C++ bindings: yes
>> Fortran77 bindings: yes (all)
>> Fortran90 bindings: yes
>> Fortran90 bindings size: trivial
>> C compiler: cc
>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>> C++ compiler: CC
>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>> Fortran77 compiler: f77
>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>> Fortran90 compiler: f95
>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>> C profiling: yes
>> C++ profiling: yes
>> Fortran77 profiling: yes
>> Fortran90 profiling: yes
>> C++ exceptions: yes
>> Thread support: no
>> Internal debug support: no
>> MPI parameter check: runtime
>> Memory profiling support: no
>> Memory debugging support: no
>> libltdl support: yes
>> Heterogeneous support: yes
>> mpirun default --prefix: yes
>> MCA backtrace: printstack (MCA v1.0, API v1.0, Component
>> v1.2.1)
>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
>> v1.2.1)
>> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2.1)
>> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
>> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA ras: dash_host (MCA v1.0, API v1.3, Component
>> v1.2.1)
>> MCA ras: gridengine (MCA v1.0, API v1.3, Component
>> v1.2.1)
>> MCA ras: localhost (MCA v1.0, API v1.3, Component
>> v1.2.1)
>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
>> v1.2.1)
>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA pls: gridengine (MCA v1.0, API v1.3, Component
>> v1.2.1)
>> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
>> MCA sds: singleton (MCA v1.0, API v1.0, Component
>> v1.2.1)
>>
>> When I try to run a simple hello world program by issuing following
>> command:
>>
>> *mpirun -np 4 -mca btl mx,sm,self -machinefile machines ./hello
>>
>> *The following error appears:
>>
>> --------------------------------------------------------------------------
>>
>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>>
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>>
>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>>
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>>
>> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-*** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> 12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>>
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> The output from more */var/run/fms/fma.log*
>>
>> Sat Sep 22 10:47:50 2007 NIC 0: M3F-PCIXF-2 s/n=297218 1 ports, speed=2G
>> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:7c
>> Sat Sep 22 10:47:50 2007 NIC 1: M3F-PCIXF-2 s/n=297248 1 ports, speed=2G
>> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:5e
>> Sat Sep 22 10:47:50 2007 fms-1.2.1 fma starting
>> Sat Sep 22 10:47:50 2007 Mapper was 00:00:00:00:00:00, l=0, is now
>> 00:60:dd:47:ad:7c, l=1
>> Sat Sep 22 10:47:50 2007 Mapping fabric...
>> Sat Sep 22 10:47:54 2007 Mapper was 00:60:dd:47:ad:7c, l=1, is now
>> 00:60:dd:47:b3:e8, l=1
>> Sat Sep 22 10:47:54 2007 Cancelling mapping
>> Sat Sep 22 10:47:59 2007 5 hosts, 8 nics, 6 xbars, 40 links
>> Sat Sep 22 10:47:59 2007 map version is 1987557551
>> Sat Sep 22 10:47:59 2007 Found NIC 0 at index 3!
>> Sat Sep 22 10:47:59 2007 Found NIC 1 at index 2!
>> Sat Sep 22 10:47:59 2007 map seems OK
>> Sat Sep 22 10:47:59 2007 Routing took 0 seconds
>> Mon Sep 24 14:26:46 2007 Requesting remap from indus4
>> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:5a, lev=1, pkt_type=0
>> Mon Sep 24 14:26:51 2007 6 hosts, 10 nics, 6 xbars, 42 links
>> Mon Sep 24 14:26:51 2007 map version is 1987557552
>> Mon Sep 24 14:26:51 2007 Found NIC 0 at index 3!
>> Mon Sep 24 14:26:51 2007 Found NIC 1 at index 2!
>> Mon Sep 24 14:26:51 2007 map seems OK
>> Mon Sep 24 14:26:51 2007 Routing took 0 seconds
>> Mon Sep 24 14:35:17 2007 Requesting remap from indus4
>> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:bf, lev=1, pkt_type=0
>> Mon Sep 24 14:35:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
>> Mon Sep 24 14:35:19 2007 map version is 1987557553
>> Mon Sep 24 14:35:19 2007 Found NIC 0 at index 5!
>> Mon Sep 24 14:35:19 2007 Found NIC 1 at index 4!
>> Mon Sep 24 14:35:19 2007 map seems OK
>> Mon Sep 24 14:35:19 2007 Routing took 0 seconds
>> Tue Sep 25 21:47:52 2007 6 hosts, 9 nics, 6 xbars, 41 links
>> Tue Sep 25 21:47:52 2007 map version is 1987557554
>> Tue Sep 25 21:47:52 2007 Found NIC 0 at index 3!
>> Tue Sep 25 21:47:52 2007 Found NIC 1 at index 2!
>> Tue Sep 25 21:47:52 2007 map seems OK
>> Tue Sep 25 21:47:52 2007 Routing took 0 seconds
>> Tue Sep 25 21:52:02 2007 Requesting remap from indus4
>> (00:60:dd:47:b3:e8): empty port x0p15 is no longer empty
>> Tue Sep 25 21:52:07 2007 6 hosts, 10 nics, 6 xbars, 42 links
>> Tue Sep 25 21:52:07 2007 map version is 1987557555
>> Tue Sep 25 21:52:07 2007 Found NIC 0 at index 4!
>> Tue Sep 25 21:52:07 2007 Found NIC 1 at index 3!
>> Tue Sep 25 21:52:07 2007 map seems OK
>> Tue Sep 25 21:52:07 2007 Routing took 0 seconds
>> Tue Sep 25 21:52:23 2007 7 hosts, 11 nics, 6 xbars, 43 links
>> Tue Sep 25 21:52:23 2007 map version is 1987557556
>> Tue Sep 25 21:52:23 2007 Found NIC 0 at index 6!
>> Tue Sep 25 21:52:23 2007 Found NIC 1 at index 5!
>> Tue Sep 25 21:52:23 2007 map seems OK
>> Tue Sep 25 21:52:23 2007 Routing took 0 seconds
>> Wed Sep 26 05:07:01 2007 Requesting remap from indus4
>> (00:60:dd:47:b3:e8): verify failed x1p2, nic 0, port 0 route=-9 4 10
>> reply=-10 -4 9 , remote=ravi2 NIC
>> 1, p0 mac=00:60:dd:47:ad:5f
>> Wed Sep 26 05:07:06 2007 6 hosts, 9 nics, 6 xbars, 41 links
>> Wed Sep 26 05:07:06 2007 map version is 1987557557
>> Wed Sep 26 05:07:06 2007 Found NIC 0 at index 3!
>> Wed Sep 26 05:07:06 2007 Found NIC 1 at index 2!
>> Wed Sep 26 05:07:06 2007 map seems OK
>> Wed Sep 26 05:07:06 2007 Routing took 0 seconds
>> Wed Sep 26 05:11:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
>> Wed Sep 26 05:11:19 2007 map version is 1987557558
>> Wed Sep 26 05:11:19 2007 Found NIC 0 at index 3!
>> Wed Sep 26 05:11:19 2007 Found NIC 1 at index 2!
>> Wed Sep 26 05:11:19 2007 map seems OK
>> Wed Sep 26 05:11:19 2007 Routing took 0 seconds
>> Thu Sep 27 11:45:37 2007 6 hosts, 9 nics, 6 xbars, 41 links
>> Thu Sep 27 11:45:37 2007 map version is 1987557559
>> Thu Sep 27 11:45:37 2007 Found NIC 0 at index 6!
>> Thu Sep 27 11:45:37 2007 Found NIC 1 at index 5!
>> Thu Sep 27 11:45:37 2007 map seems OK
>> Thu Sep 27 11:45:37 2007 Routing took 0 seconds
>> Thu Sep 27 11:51:02 2007 7 hosts, 11 nics, 6 xbars, 43 links
>> Thu Sep 27 11:51:02 2007 map version is 1987557560
>> Thu Sep 27 11:51:02 2007 Found NIC 0 at index 6!
>> Thu Sep 27 11:51:02 2007 Found NIC 1 at index 5!
>> Thu Sep 27 11:51:02 2007 map seems OK
>> Thu Sep 27 11:51:02 2007 Routing took 0 seconds
>> Fri Sep 28 13:27:10 2007 Requesting remap from indus4
>> (00:60:dd:47:b3:e8): verify failed x5p0, nic 1, port 0 route=-8 15 6
>> reply=-6 -15 8 , remote=ravi1 NIC
>> 0, p0 mac=00:60:dd:47:b3:bf
>> Fri Sep 28 13:27:24 2007 6 hosts, 8 nics, 6 xbars, 40 links
>> Fri Sep 28 13:27:24 2007 map version is 1987557561
>> Fri Sep 28 13:27:24 2007 Found NIC 0 at index 5!
>> Fri Sep 28 13:27:24 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
>> Fri Sep 28 13:27:24 2007 map seems OK
>> Fri Sep 28 13:27:24 2007 Routing took 0 seconds
>> Fri Sep 28 13:27:44 2007 7 hosts, 10 nics, 6 xbars, 42 links
>> Fri Sep 28 13:27:44 2007 map version is 1987557562
>> Fri Sep 28 13:27:44 2007 Found NIC 0 at index 7!
>> Fri Sep 28 13:27:44 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
>> Fri Sep 28 13:27:44 2007 map seems OK
>> Fri Sep 28 13:27:44 2007 Routing took 0 seconds
>>
>> Do you have any suggestion or comments why this error appear and whats
>> the solution to this problem. I have checked community mailing list for
>> this problem and found few topics related to this, but could find any
>> solution. Any suggestion or comments will be highly appreciated.
>>
>> The code that i m trying to run is given as follows:
>>
>> #include <stdio.h>
>> #include "mpi.h"
>> int main(int argc, char **argv)
>> {
>> int rank, size, tag, rc, i;
>> MPI_Status status;
>> char message[20];
>> rc = MPI_Init(&argc, &argv);
>> rc = MPI_Comm_size(MPI_COMM_WORLD, &size);
>> rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> tag = 100;
>> if(rank == 0) {
>> strcpy(message, "Hello, world");
>> for (i=1; i<size; i++)
>> rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD);
>> }
>> else
>> rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
>> &status);
>> printf( "node %d : %.13s\n", rank,message);
>> rc = MPI_Finalize();
>> return 0;
>> }
>>
>> Thanks.
>> Looking forward.
>> Best regards,
>> Hammad Siddiqi
>> Center for High Performance Scientific Computing
>> NUST Institute of Information Technology,
>> National University of Sciences and Technology,
>> Rawalpindi, Pakistan.
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.