Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Hammad Siddiqi (hammad.siddiqi_at_[hidden])
Date: 2007-10-01 02:00:21


Dear Tim,

Your and Tim Matox's suggestion yielded following results,

*1. /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
"indus1,indus2" -mca btl_base_debug 1000 ./hello*

/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl mx,sm,self -host
"indus1,indus2,indus3,indus4" -mca btl_base_debug 1000 ./hello
[indus1:29331] select: initializing btl component mx
[indus1:29331] select: init returned failure
[indus1:29331] select: module mx unloaded
[indus1:29331] select: initializing btl component sm
[indus1:29331] select: init returned success
[indus1:29331] select: initializing btl component self
[indus1:29331] select: init returned success
[indus3:13520] select: initializing btl component mx
[indus3:13520] select: init returned failure
[indus3:13520] select: module mx unloaded
[indus3:13520] select: initializing btl component sm
[indus3:13520] select: init returned success
[indus3:13520] select: initializing btl component self
[indus3:13520] select: init returned success
[indus4:15486] select: initializing btl component mx
[indus4:15486] select: init returned failure
[indus4:15486] select: module mx unloaded
[indus4:15486] select: initializing btl component sm
[indus4:15486] select: init returned success
[indus4:15486] select: initializing btl component self
[indus4:15486] select: init returned success
[indus2:11351] select: initializing btl component mx
[indus2:11351] select: init returned failure
[indus2:11351] select: module mx unloaded
[indus2:11351] select: initializing btl component sm
[indus2:11351] select: init returned success
[indus2:11351] select: initializing btl component self
[indus2:11351] select: init returned success
--------------------------------------------------------------------------
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during
MPI_INIT--------------------------------------------------------------------------
Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

*2.1 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host
"indus1,indus2,indus3,indus4" ./hello*

This command works fine

*2.2 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca mtl mx -host
"indus1,indus2,indus3,indus4" -mca pml cm ./hello*

This command works fine.
Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca pml cm -host
"indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"*, this
command works fine.
but *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca pml cm -host
"indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"* hangs
for indefinite time.

Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host
"indus1,indus2,indus3,indus4" -mca mtl_base_debug 1000 ./hello"* works
fine

*2.3 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host
"indus1,indus2,indus3,indus4" -mca pml cm ./hello*

This command hangs the machines for indefinite time.
Also *"/opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx -host
"indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000
./hello"* hangs the
systems for indefinite time.

*2.4 /opt/SUNWhpc/HPC7.0/bin/mpirun -np 8 -mca mtl mx,sm,self -host
"indus1,indus2,indus3,indus4" -mca pml cm -mca mtl_base_debug 1000 ./hello*

This command hangs the machines for indefinite time.

Please notice that running more than four mpi processes hangs the
machines. Any suggestion please.

Thanks,

Best Regards,
Hammad Siddiqi

Tim Prins wrote:
> I would reccommend trying a few things:
>
> 1. Set some debugging flags and see if that helps. So, I would try something
> like:
> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,self -host "indus1,indus2" -mca btl_base_debug 1000 ./hello
>
> This will output information as each btl is loaded, and whether or not the
> load succeeds.
>
> 2. Try running with the mx mtl instead of the btl:
> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" ./hello
>
> Similarly, for debug output:
> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca mtl mx -host "indus1,indus2" -mca
> mtl_base_debug 1000 ./hello
>
> Let me know if any of these work.
>
> Thanks,
>
> Tim
>
> On Saturday 29 September 2007 01:53:06 am Hammad Siddiqi wrote:
>
>> Hi Terry,
>>
>> Thanks for replying. The following command is working fine:
>>
>> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 4 -mca btl tcp,sm,self -machinefile
>> machines ./hello
>>
>> The contents of machines are:
>> indus1
>> indus2
>> indus3
>> indus4
>>
>> I have tried using np=2 over pairs of machines, but the problem is same.
>> The errors that occur are given below with the command that I am trying.
>>
>> **Test 1**
>>
>> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
>> "indus1,indus2" ./hello
>> --------------------------------------------------------------------------
>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> **Test 2*
>>
>> */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
>> "indus1,indus3" ./hello
>> --------------------------------------------------------------------------
>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> *
>> *Test 3*
>> */opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
>> "indus1,indus4" ./hello
>> --------------------------------------------------------------------------
>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> **Test4**
>>
>> /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
>> "indus2,indus4" ./hello
>> --------------------------------------------------------------------------
>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> *
>>
>> *Test5*
>>
>> * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
>> "indus2,indus3" ./hello
>> --------------------------------------------------------------------------
>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> **Test 6*
>>
>> * /opt/SUNWhpc/HPC7.0/bin/mpirun -np 2 -mca btl mx,sm,self -host
>> "indus3,indus4" ./hello
>> --------------------------------------------------------------------------
>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --------------------------------------------------------------------------
>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>> If you specified the use of a BTL component, you may have
>> forgotten a component (such as "self") in the list of
>> usable components.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> PML add procs failed
>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> **END OF TESTS**
>>
>> There is one thing to note that when I run this command including -mca
>> pml cm it works fine :S
>>
>> mpirun -np 4 -mca btl mx,sm,self -mca pml cm -machinefile machines ./hello
>> Hello MPI! Process 4 of 1 on indus2
>> Hello MPI! Process 4 of 2 on indus3
>> Hello MPI! Process 4 of 3 on indus4
>> Hello MPI! Process 4 of 0 on indus1
>>
>> To my knowledge this command is not using shared memory and is only
>> using myrinet as interconnect.
>> One more thing I cannot start more than 4 processes in this case, The
>> mpirun process hangs.
>>
>> Any suggestions?
>>
>> Once again, thanks for your help.
>>
>> Regards,
>> Hammad
>>
>> Terry Dontje wrote:
>>
>>> Hi Hammad,
>>>
>>> It looks to me like none of the btl's could resolve a route between the
>>> node that process rank 0 is on to the other nodes.
>>> I would suggest trying np=2 over a couple pairs of machines to see if
>>> that works and you can truly be sure that only the
>>> first node is having this problem.
>>>
>>> It also might be helpful as a sanity check to use the tcp btl instead of
>>> mx and see if you get more traction with that.
>>>
>>> --td
>>>
>>>
>>>> *From:* Hammad Siddiqi (/hammad.siddiqi_at_[hidden]/)
>>>> *Date:* 2007-09-28 07:38:01
>>>>
>>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I am using Sun HPC Toolkit 7.0 to compile and run my C MPI programs.
>>>>
>>>> I have tested the myrinet installations using myricoms own test
>>>> programs. The Myricom software stack I am using is MX and the vesrion is
>>>> mx2g-1.1.7, mx_mapper is also used.
>>>> We have 4 nodes having 8 dual core processors each (Sun Fire v890) and
>>>> the operating system is
>>>> Solaris 10 (SunOS indus1 5.10 Generic_125100-10 sun4u sparc
>>>> SUNW,Sun-Fire-V890).
>>>>
>>>> The contents of machine file are:
>>>> indus1
>>>> indus2
>>>> indus3
>>>> indus4
>>>>
>>>> The output of *mx_info* on each node is given below
>>>>
>>>> =====*=
>>>> indus1
>>>> *======
>>>>
>>>> MX Version: 1.1.7rc3cvs1_1_fixes
>>>> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
>>>> 2 Myrinet boards installed.
>>>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>>>> ===================================================================
>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>>>> Status: Running, P0: Link up
>>>> MAC Address: 00:60:dd:47:ad:7c
>>>> Product code: M3F-PCIXF-2
>>>> Part number: 09-03392
>>>> Serial number: 297218
>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>>>> Mapped hosts: 10
>>>>
>>>>
>>>> ROUTE COUNT
>>>> INDEX MAC ADDRESS HOST NAME P0
>>>> ----- -----------
>>>> --------- ---
>>>> 0) 00:60:dd:47:ad:7c indus1:0 1,1
>>>> 2) 00:60:dd:47:ad:68 indus4:0 8,3
>>>> 3) 00:60:dd:47:b3:e8 indus4:1 7,3
>>>> 4) 00:60:dd:47:b3:ab indus2:0 7,3
>>>> 5) 00:60:dd:47:ad:66 indus3:0 8,3
>>>> 6) 00:60:dd:47:ad:76 indus3:1 8,3
>>>> 7) 00:60:dd:47:ad:77 jhelum1:0 8,3
>>>> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
>>>> 9) 00:60:dd:47:ad:5f ravi2:1 1,1
>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>>>> ===================================================================
>>>>
>>>> ======
>>>> *indus2*
>>>> ======
>>>>
>>>> MX Version: 1.1.7rc3cvs1_1_fixes
>>>> MX Build: @indus2:/opt/mx2g-1.1.7rc3 Thu May 31 11:24:03 PKT 2007
>>>> 2 Myrinet boards installed.
>>>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>>>> ===================================================================
>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>>>> Status: Running, P0: Link up
>>>> MAC Address: 00:60:dd:47:b3:ab
>>>> Product code: M3F-PCIXF-2
>>>> Part number: 09-03392
>>>> Serial number: 296636
>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>>>> Mapped hosts: 10
>>>>
>>>> ROUTE
>>>> COUNT
>>>> INDEX MAC ADDRESS HOST NAME P0
>>>> ----- ----------- --------- ---
>>>> 0) 00:60:dd:47:b3:ab indus2:0 1,1
>>>> 2) 00:60:dd:47:ad:68 indus4:0 1,1
>>>> 3) 00:60:dd:47:b3:e8 indus4:1 8,3
>>>> 4) 00:60:dd:47:ad:66 indus3:0 1,1
>>>> 5) 00:60:dd:47:ad:76 indus3:1 7,3
>>>> 6) 00:60:dd:47:ad:77 jhelum1:0 7,3
>>>> 8) 00:60:dd:47:ad:7c indus1:0 8,3
>>>> 9) 00:60:dd:47:b3:5a ravi2:0 8,3
>>>> 10) 00:60:dd:47:ad:5f ravi2:1 8,3
>>>> 11) 00:60:dd:47:b3:bf ravi1:0 7,3
>>>> ===================================================================
>>>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>>>> Status: Running, P0: Link down
>>>> MAC Address: 00:60:dd:47:b3:c3
>>>> Product code: M3F-PCIXF-2
>>>> Part number: 09-03392
>>>> Serial number: 296612
>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>>>> Mapped hosts: 10
>>>>
>>>> ======
>>>> *indus3*
>>>> ======
>>>> MX Version: 1.1.7rc3cvs1_1_fixes
>>>> MX Build: @indus3:/opt/mx2g-1.1.7rc3 Thu May 31 11:29:03 PKT 2007
>>>> 2 Myrinet boards installed.
>>>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>>>> ===================================================================
>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>>>> Status: Running, P0: Link up
>>>> MAC Address: 00:60:dd:47:ad:66
>>>> Product code: M3F-PCIXF-2
>>>> Part number: 09-03392
>>>> Serial number: 297240
>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>>>> Mapped hosts: 10
>>>>
>>>> ROUTE
>>>> COUNT
>>>> INDEX MAC ADDRESS HOST NAME P0
>>>> ----- ----------- --------- ---
>>>> 0) 00:60:dd:47:ad:66 indus3:0 1,1
>>>> 1) 00:60:dd:47:ad:76 indus3:1 8,3
>>>> 2) 00:60:dd:47:ad:68 indus4:0 1,1
>>>> 3) 00:60:dd:47:b3:e8 indus4:1 6,3
>>>> 4) 00:60:dd:47:ad:77 jhelum1:0 8,3
>>>> 5) 00:60:dd:47:b3:ab indus2:0 1,1
>>>> 7) 00:60:dd:47:ad:7c indus1:0 8,3
>>>> 8) 00:60:dd:47:b3:5a ravi2:0 8,3
>>>> 9) 00:60:dd:47:ad:5f ravi2:1 7,3
>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>>>> ===================================================================
>>>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>>>> Status: Running, P0: Link up
>>>> MAC Address: 00:60:dd:47:ad:76
>>>> Product code: M3F-PCIXF-2
>>>> Part number: 09-03392
>>>> Serial number: 297224
>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>>>> Mapped hosts: 10
>>>>
>>>> ROUTE
>>>> COUNT
>>>> INDEX MAC ADDRESS HOST NAME P0
>>>> ----- ----------- --------- ---
>>>> 0) 00:60:dd:47:ad:66 indus3:0 8,3
>>>> 1) 00:60:dd:47:ad:76 indus3:1 1,1
>>>> 2) 00:60:dd:47:ad:68 indus4:0 7,3
>>>> 3) 00:60:dd:47:b3:e8 indus4:1 1,1
>>>> 4) 00:60:dd:47:ad:77 jhelum1:0 1,1
>>>> 5) 00:60:dd:47:b3:ab indus2:0 7,3
>>>> 7) 00:60:dd:47:ad:7c indus1:0 8,3
>>>> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
>>>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>>>>
>>>> ======
>>>> *indus4*
>>>> ======
>>>>
>>>> MX Version: 1.1.7rc3cvs1_1_fixes
>>>> MX Build: @indus4:/opt/mx2g-1.1.7rc3 Thu May 31 11:36:59 PKT 2007
>>>> 2 Myrinet boards installed.
>>>> The MX driver is configured to support up to 4 instances and 1024 nodes.
>>>> ===================================================================
>>>> Instance #0: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>>>> Status: Running, P0: Link up
>>>> MAC Address: 00:60:dd:47:ad:68
>>>> Product code: M3F-PCIXF-2
>>>> Part number: 09-03392
>>>> Serial number: 297238
>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>>>> Mapped hosts: 10
>>>>
>>>> ROUTE
>>>> COUNT
>>>> INDEX MAC ADDRESS HOST NAME P0
>>>> ----- ----------- --------- ---
>>>> 0) 00:60:dd:47:ad:68 indus4:0 1,1
>>>> 1) 00:60:dd:47:b3:e8 indus4:1 7,3
>>>> 2) 00:60:dd:47:ad:77 jhelum1:0 7,3
>>>> 3) 00:60:dd:47:ad:66 indus3:0 1,1
>>>> 4) 00:60:dd:47:ad:76 indus3:1 7,3
>>>> 5) 00:60:dd:47:b3:ab indus2:0 1,1
>>>> 7) 00:60:dd:47:ad:7c indus1:0 7,3
>>>> 8) 00:60:dd:47:b3:5a ravi2:0 7,3
>>>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
>>>> 10) 00:60:dd:47:b3:bf ravi1:0 7,3
>>>> ===================================================================
>>>> Instance #1: 333.2 MHz LANai, 66.7 MHz PCI bus, 2 MB SRAM
>>>> Status: Running, P0: Link up
>>>> MAC Address: 00:60:dd:47:b3:e8
>>>> Product code: M3F-PCIXF-2
>>>> Part number: 09-03392
>>>> Serial number: 296575
>>>> Mapper: 00:60:dd:47:b3:e8, version = 0x7677b8ba, configured
>>>> Mapped hosts: 10
>>>>
>>>> ROUTE
>>>> COUNT
>>>> INDEX MAC ADDRESS HOST NAME P0
>>>> ----- ----------- --------- ---
>>>> 0) 00:60:dd:47:ad:68 indus4:0 6,3
>>>> 1) 00:60:dd:47:b3:e8 indus4:1 1,1
>>>> 2) 00:60:dd:47:ad:77 jhelum1:0 1,1
>>>> 3) 00:60:dd:47:ad:66 indus3:0 8,3
>>>> 4) 00:60:dd:47:ad:76 indus3:1 1,1
>>>> 5) 00:60:dd:47:b3:ab indus2:0 8,3
>>>> 7) 00:60:dd:47:ad:7c indus1:0 7,3
>>>> 8) 00:60:dd:47:b3:5a ravi2:0 6,3
>>>> 9) 00:60:dd:47:ad:5f ravi2:1 8,3
>>>> 10) 00:60:dd:47:b3:bf ravi1:0 8,3
>>>>
>>>> The output from *ompi_info* is:
>>>>
>>>> Open MPI: 1.2.1r14096-ct7b030r1838
>>>> Open MPI SVN revision: 0
>>>> Open RTE: 1.2.1r14096-ct7b030r1838
>>>> Open RTE SVN revision: 0
>>>> OPAL: 1.2.1r14096-ct7b030r1838
>>>> OPAL SVN revision: 0
>>>> Prefix: /opt/SUNWhpc/HPC7.0
>>>> Configured architecture: sparc-sun-solaris2.10
>>>> Configured by: root
>>>> Configured on: Fri Mar 30 12:49:36 EDT 2007
>>>> Configure host: burpen-on10-0
>>>> Built by: root
>>>> Built on: Fri Mar 30 13:10:46 EDT 2007
>>>> Built host: burpen-on10-0
>>>> C bindings: yes
>>>> C++ bindings: yes
>>>> Fortran77 bindings: yes (all)
>>>> Fortran90 bindings: yes
>>>> Fortran90 bindings size: trivial
>>>> C compiler: cc
>>>> C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>>>> C++ compiler: CC
>>>> C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>>>> Fortran77 compiler: f77
>>>> Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>>>> Fortran90 compiler: f95
>>>> Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>>>> C profiling: yes
>>>> C++ profiling: yes
>>>> Fortran77 profiling: yes
>>>> Fortran90 profiling: yes
>>>> C++ exceptions: yes
>>>> Thread support: no
>>>> Internal debug support: no
>>>> MPI parameter check: runtime
>>>> Memory profiling support: no
>>>> Memory debugging support: no
>>>> libltdl support: yes
>>>> Heterogeneous support: yes
>>>> mpirun default --prefix: yes
>>>> MCA backtrace: printstack (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>> MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA btl: mx (MCA v1.0, API v1.0.1, Component v1.2.1)
>>>> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
>>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
>>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>>>> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA mtl: mx (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>>>> MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA ras: dash_host (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA ras: gridengine (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA ras: localhost (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA rds: hostfile (MCA v1.0, API v1.3, Component
>>>> v1.2.1) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA rds:
>>>> resfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rmaps: round_robin
>>>> (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
>>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
>>>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA pls: gridengine (MCA v1.0, API v1.3, Component
>>>> v1.2.1)
>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
>>>> MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
>>>> MCA sds: singleton (MCA v1.0, API v1.0, Component
>>>> v1.2.1)
>>>>
>>>> When I try to run a simple hello world program by issuing following
>>>> command:
>>>>
>>>> *mpirun -np 4 -mca btl mx,sm,self -machinefile machines ./hello
>>>>
>>>> *The following error appears:
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>>>> If you specified the use of a BTL component, you may have
>>>> forgotten a component (such as "self") in the list of
>>>> usable components.
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or
>>>> environment
>>>> problems. This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>> PML add procs failed
>>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> *** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>>>> If you specified the use of a BTL component, you may have
>>>> forgotten a component (such as "self") in the list of
>>>> usable components.
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or
>>>> environment
>>>> problems. This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>> PML add procs failed
>>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> *** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
>>>> If you specified the use of a BTL component, you may have
>>>> forgotten a component (such as "self") in the list of
>>>> usable components.
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or
>>>> environment
>>>> problems. This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>> PML add procs failed
>>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> Process 0.1.2 is unable to reach 0.1.0 for MPI communication.
>>>> If you specified the use of a BTL component, you may have
>>>> forgotten a component (such as "self") in the list of
>>>> usable components.
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or
>>>> environment
>>>> problems. This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>> PML add procs failed
>>>> --> Returned "Unreachable" (-*** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>> 12) instead of "Success" (0)
>>>> ------------------------------------------------------------------------
>>>> --
>>>>
>>>> *** An error occurred in MPI_Init
>>>> *** before MPI was initialized
>>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>>
>>>> The output from more */var/run/fms/fma.log*
>>>>
>>>> Sat Sep 22 10:47:50 2007 NIC 0: M3F-PCIXF-2 s/n=297218 1 ports, speed=2G
>>>> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:7c
>>>> Sat Sep 22 10:47:50 2007 NIC 1: M3F-PCIXF-2 s/n=297248 1 ports, speed=2G
>>>> Sat Sep 22 10:47:50 2007 mac = 00:60:dd:47:ad:5e
>>>> Sat Sep 22 10:47:50 2007 fms-1.2.1 fma starting
>>>> Sat Sep 22 10:47:50 2007 Mapper was 00:00:00:00:00:00, l=0, is now
>>>> 00:60:dd:47:ad:7c, l=1
>>>> Sat Sep 22 10:47:50 2007 Mapping fabric...
>>>> Sat Sep 22 10:47:54 2007 Mapper was 00:60:dd:47:ad:7c, l=1, is now
>>>> 00:60:dd:47:b3:e8, l=1
>>>> Sat Sep 22 10:47:54 2007 Cancelling mapping
>>>> Sat Sep 22 10:47:59 2007 5 hosts, 8 nics, 6 xbars, 40 links
>>>> Sat Sep 22 10:47:59 2007 map version is 1987557551
>>>> Sat Sep 22 10:47:59 2007 Found NIC 0 at index 3!
>>>> Sat Sep 22 10:47:59 2007 Found NIC 1 at index 2!
>>>> Sat Sep 22 10:47:59 2007 map seems OK
>>>> Sat Sep 22 10:47:59 2007 Routing took 0 seconds
>>>> Mon Sep 24 14:26:46 2007 Requesting remap from indus4
>>>> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:5a, lev=1, pkt_type=0
>>>> Mon Sep 24 14:26:51 2007 6 hosts, 10 nics, 6 xbars, 42 links
>>>> Mon Sep 24 14:26:51 2007 map version is 1987557552
>>>> Mon Sep 24 14:26:51 2007 Found NIC 0 at index 3!
>>>> Mon Sep 24 14:26:51 2007 Found NIC 1 at index 2!
>>>> Mon Sep 24 14:26:51 2007 map seems OK
>>>> Mon Sep 24 14:26:51 2007 Routing took 0 seconds
>>>> Mon Sep 24 14:35:17 2007 Requesting remap from indus4
>>>> (00:60:dd:47:b3:e8): scouted by 00:60:dd:47:b3:bf, lev=1, pkt_type=0
>>>> Mon Sep 24 14:35:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
>>>> Mon Sep 24 14:35:19 2007 map version is 1987557553
>>>> Mon Sep 24 14:35:19 2007 Found NIC 0 at index 5!
>>>> Mon Sep 24 14:35:19 2007 Found NIC 1 at index 4!
>>>> Mon Sep 24 14:35:19 2007 map seems OK
>>>> Mon Sep 24 14:35:19 2007 Routing took 0 seconds
>>>> Tue Sep 25 21:47:52 2007 6 hosts, 9 nics, 6 xbars, 41 links
>>>> Tue Sep 25 21:47:52 2007 map version is 1987557554
>>>> Tue Sep 25 21:47:52 2007 Found NIC 0 at index 3!
>>>> Tue Sep 25 21:47:52 2007 Found NIC 1 at index 2!
>>>> Tue Sep 25 21:47:52 2007 map seems OK
>>>> Tue Sep 25 21:47:52 2007 Routing took 0 seconds
>>>> Tue Sep 25 21:52:02 2007 Requesting remap from indus4
>>>> (00:60:dd:47:b3:e8): empty port x0p15 is no longer empty
>>>> Tue Sep 25 21:52:07 2007 6 hosts, 10 nics, 6 xbars, 42 links
>>>> Tue Sep 25 21:52:07 2007 map version is 1987557555
>>>> Tue Sep 25 21:52:07 2007 Found NIC 0 at index 4!
>>>> Tue Sep 25 21:52:07 2007 Found NIC 1 at index 3!
>>>> Tue Sep 25 21:52:07 2007 map seems OK
>>>> Tue Sep 25 21:52:07 2007 Routing took 0 seconds
>>>> Tue Sep 25 21:52:23 2007 7 hosts, 11 nics, 6 xbars, 43 links
>>>> Tue Sep 25 21:52:23 2007 map version is 1987557556
>>>> Tue Sep 25 21:52:23 2007 Found NIC 0 at index 6!
>>>> Tue Sep 25 21:52:23 2007 Found NIC 1 at index 5!
>>>> Tue Sep 25 21:52:23 2007 map seems OK
>>>> Tue Sep 25 21:52:23 2007 Routing took 0 seconds
>>>> Wed Sep 26 05:07:01 2007 Requesting remap from indus4
>>>> (00:60:dd:47:b3:e8): verify failed x1p2, nic 0, port 0 route=-9 4 10
>>>> reply=-10 -4 9 , remote=ravi2 NIC
>>>> 1, p0 mac=00:60:dd:47:ad:5f
>>>> Wed Sep 26 05:07:06 2007 6 hosts, 9 nics, 6 xbars, 41 links
>>>> Wed Sep 26 05:07:06 2007 map version is 1987557557
>>>> Wed Sep 26 05:07:06 2007 Found NIC 0 at index 3!
>>>> Wed Sep 26 05:07:06 2007 Found NIC 1 at index 2!
>>>> Wed Sep 26 05:07:06 2007 map seems OK
>>>> Wed Sep 26 05:07:06 2007 Routing took 0 seconds
>>>> Wed Sep 26 05:11:19 2007 7 hosts, 11 nics, 6 xbars, 43 links
>>>> Wed Sep 26 05:11:19 2007 map version is 1987557558
>>>> Wed Sep 26 05:11:19 2007 Found NIC 0 at index 3!
>>>> Wed Sep 26 05:11:19 2007 Found NIC 1 at index 2!
>>>> Wed Sep 26 05:11:19 2007 map seems OK
>>>> Wed Sep 26 05:11:19 2007 Routing took 0 seconds
>>>> Thu Sep 27 11:45:37 2007 6 hosts, 9 nics, 6 xbars, 41 links
>>>> Thu Sep 27 11:45:37 2007 map version is 1987557559
>>>> Thu Sep 27 11:45:37 2007 Found NIC 0 at index 6!
>>>> Thu Sep 27 11:45:37 2007 Found NIC 1 at index 5!
>>>> Thu Sep 27 11:45:37 2007 map seems OK
>>>> Thu Sep 27 11:45:37 2007 Routing took 0 seconds
>>>> Thu Sep 27 11:51:02 2007 7 hosts, 11 nics, 6 xbars, 43 links
>>>> Thu Sep 27 11:51:02 2007 map version is 1987557560
>>>> Thu Sep 27 11:51:02 2007 Found NIC 0 at index 6!
>>>> Thu Sep 27 11:51:02 2007 Found NIC 1 at index 5!
>>>> Thu Sep 27 11:51:02 2007 map seems OK
>>>> Thu Sep 27 11:51:02 2007 Routing took 0 seconds
>>>> Fri Sep 28 13:27:10 2007 Requesting remap from indus4
>>>> (00:60:dd:47:b3:e8): verify failed x5p0, nic 1, port 0 route=-8 15 6
>>>> reply=-6 -15 8 , remote=ravi1 NIC
>>>> 0, p0 mac=00:60:dd:47:b3:bf
>>>> Fri Sep 28 13:27:24 2007 6 hosts, 8 nics, 6 xbars, 40 links
>>>> Fri Sep 28 13:27:24 2007 map version is 1987557561
>>>> Fri Sep 28 13:27:24 2007 Found NIC 0 at index 5!
>>>> Fri Sep 28 13:27:24 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
>>>> Fri Sep 28 13:27:24 2007 map seems OK
>>>> Fri Sep 28 13:27:24 2007 Routing took 0 seconds
>>>> Fri Sep 28 13:27:44 2007 7 hosts, 10 nics, 6 xbars, 42 links
>>>> Fri Sep 28 13:27:44 2007 map version is 1987557562
>>>> Fri Sep 28 13:27:44 2007 Found NIC 0 at index 7!
>>>> Fri Sep 28 13:27:44 2007 Cannot find NIC 1 (00:60:dd:47:ad:5e) in map!
>>>> Fri Sep 28 13:27:44 2007 map seems OK
>>>> Fri Sep 28 13:27:44 2007 Routing took 0 seconds
>>>>
>>>> Do you have any suggestion or comments why this error appear and whats
>>>> the solution to this problem. I have checked community mailing list for
>>>> this problem and found few topics related to this, but could find any
>>>> solution. Any suggestion or comments will be highly appreciated.
>>>>
>>>> The code that i m trying to run is given as follows:
>>>>
>>>> #include <stdio.h>
>>>> #include "mpi.h"
>>>> int main(int argc, char **argv)
>>>> {
>>>> int rank, size, tag, rc, i;
>>>> MPI_Status status;
>>>> char message[20];
>>>> rc = MPI_Init(&argc, &argv);
>>>> rc = MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>> rc = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>> tag = 100;
>>>> if(rank == 0) {
>>>> strcpy(message, "Hello, world");
>>>> for (i=1; i<size; i++)
>>>> rc = MPI_Send(message, 13, MPI_CHAR, i, tag, MPI_COMM_WORLD);
>>>> }
>>>> else
>>>> rc = MPI_Recv(message, 13, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
>>>> &status);
>>>> printf( "node %d : %.13s\n", rank,message);
>>>> rc = MPI_Finalize();
>>>> return 0;
>>>> }
>>>>
>>>> Thanks.
>>>> Looking forward.
>>>> Best regards,
>>>> Hammad Siddiqi
>>>> Center for High Performance Scientific Computing
>>>> NUST Institute of Information Technology,
>>>> National University of Sciences and Technology,
>>>> Rawalpindi, Pakistan.
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.