Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 10
From: Allan Menezes (amenezes007_at_[hidden])
Date: 2008-10-31 20:47:57


users-request_at_[hidden] wrote:

>Send users mailing list submissions to
> users_at_[hidden]
>
>To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>or, via email, send a message with subject or body 'help' to
> users-request_at_[hidden]
>
>You can reach the person managing the list at
> users-owner_at_[hidden]
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of users digest..."
>
>
>Today's Topics:
>
> 1. Re: Problem with openmpi version 1.3b1 beta1 (Ralph Castain)
> 2. Re: problem running Open MPI on Cells (Mi Yan)
>
>
>
Hi Ralph,
    I solved the problem. With Ver 1.28 same system this configure works
./configure --prefix=/opt/openmpi128 --enable-mpi-threads
--with-threads=posix --disable-ipv6 but does not work with any ver 1.3
beacuse of ipv6 so to fix it i rebuilt it after make clean with ipv6
enabled and it works!
This configure for ver 1.3 works on my system
./configure --prefix=/opt/openmpi128 --enable-mpi-threads
--with-threads=posix
Do you still want the old or the new config.log?
Allan Menezes

>----------------------------------------------------------------------
>
>Message: 1
>Date: Fri, 31 Oct 2008 17:02:15 -0600
>From: Ralph Castain <rhc_at_[hidden]>
>Subject: Re: [OMPI users] Problem with openmpi version 1.3b1 beta1
>To: Open MPI Users <users_at_[hidden]>
>Message-ID: <6BCB1362-EA2C-4B4B-AB1A-367ED7739783_at_[hidden]>
>Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
>I see you are using IPv6. From what I can tell, we do enable that
>support by default if the underlying system supports it.
>
>My best guess is that either that support is broken (we never test it
>since none of us use IPv6), or our configure system isn't properly
>detecting that it exists.
>
>Can you attach a copy of your config.log? It will tell us what the
>system thinks it should be building.
>
>Thanks
>Ralph
>
>On Oct 31, 2008, at 4:54 PM, Allan Menezes wrote:
>
>
>
>>Date: Fri, 31 Oct 2008 09:34:52 -0600
>>From: Ralph Castain <rhc_at_[hidden]>
>>Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 1
>>To: Open MPI Users <users_at_[hidden]>
>>Message-ID: <0CF28492-B13E-4F82-AC43-C1580F0794D1_at_[hidden]>
>>Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>> DelSp="yes"
>>
>>It looks like the daemon isn't seeing the other interface address
>>on host x2. Can you ssh to x2 and send the contents of ifconfig -a?
>>
>>Ralph
>>
>>On Oct 31, 2008, at 9:18 AM, Allan Menezes wrote:
>>
>>
>>
>>
>>>users-request_at_[hidden] wrote:
>>>
>>>
>>>
>>>>Send users mailing list submissions to
>>>> users_at_[hidden]
>>>>
>>>>To subscribe or unsubscribe via the World Wide Web, visit
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>or, via email, send a message with subject or body 'help' to
>>>> users-request_at_[hidden]
>>>>
>>>>You can reach the person managing the list at
>>>> users-owner_at_[hidden]
>>>>
>>>>When replying, please edit your Subject line so it is more specific
>>>>than "Re: Contents of users digest..."
>>>>
>>>>
>>>>Today's Topics:
>>>>
>>>> 1. Openmpi ver1.3beta1 (Allan Menezes)
>>>> 2. Re: Openmpi ver1.3beta1 (Ralph Castain)
>>>> 3. Re: Equivalent .h files (Benjamin Lamptey)
>>>> 4. Re: Equivalent .h files (Jeff Squyres)
>>>> 5. ompi-checkpoint is hanging (Matthias Hovestadt)
>>>> 6. unsubscibe (Bertrand P. S. Russell)
>>>> 7. Re: ompi-checkpoint is hanging (Tim Mattox)
>>>>
>>>>
>>>>----------------------------------------------------------------------
>>>>
>>>>Message: 1
>>>>Date: Fri, 31 Oct 2008 02:06:09 -0400
>>>>From: Allan Menezes <amenezes007_at_[hidden]>
>>>>Subject: [OMPI users] Openmpi ver1.3beta1
>>>>To: users_at_[hidden]
>>>>Message-ID: <BLU0-SMTP224B5E356302AC7AA4481088200_at_phx.gbl>
>>>>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>>Hi,
>>>> I built open mpi version 1.3b1 withe following cofigure command:
>>>>./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads
>>>>--with-threads=posix --disable-ipv6
>>>>I have six nodes x1..6
>>>>I distributed the /opt/openmpi13b1 with scp to all other nodes
>>>>from the
>>>>head node
>>>>When i run the following command:
>>>>mpirun --prefix /opt/openmpi13b1 --host x1 hostname it works on x1
>>>>printing out the hostname of x1
>>>>But when i type
>>>>mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and
>>>>does
>>>>not give me any output
>>>>I have a 6 node intel quad core cluster with OSCAR and pci express
>>>>gigabit ethernet for eth0
>>>>Can somebody advise?
>>>>Thank you very much.
>>>>Allan Menezes
>>>>
>>>>
>>>>------------------------------
>>>>
>>>>Message: 2
>>>>Date: Fri, 31 Oct 2008 02:41:59 -0600
>>>>From: Ralph Castain <rhc_at_[hidden]>
>>>>Subject: Re: [OMPI users] Openmpi ver1.3beta1
>>>>To: Open MPI Users <users_at_[hidden]>
>>>>Message-ID: <E8AF5AAF-99CB-4EFC-AA97-5385CE333AD2_at_[hidden]>
>>>>Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>>>>
>>>>When you typed the --host x1 command, were you sitting on x1?
>>>>Likewise, when you typed the --host x2 command, were you not on
>>>>host x2?
>>>>
>>>>If the answer to both questions is "yes", then my guess is that
>>>>something is preventing you from launching a daemon on host x2. Try
>>>>adding --leave-session-attached to your cmd line and see if any
>>>>error
>>>>messages appear. And check the FAQ for tips on how to setup for ssh
>>>>launch (I'm assuming that is what you are using).
>>>>
>>>>http://www.open-mpi.org/faq/?category=rsh
>>>>
>>>>Ralph
>>>>
>>>>On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Hi Ralph,
>>> Yes that is true I tried both commands on x1 and ver 1.28 works
>>>on the same setup without a problem.
>>>Here is the output with the added
>>>--leave-session-attached
>>>[allan_at_x1 ~]$ mpiexec --prefix /opt/openmpi13b2 --leave-session-
>>>attached -host x2 hostname
>>>[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]
>>>mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed:
>>>Network is unreachable (101)
>>>[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]
>>>mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed:
>>>Network is unreachable (101)
>>>[x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection
>>>to lifeline [[1354,0],0] lost
>>>--------------------------------------------------------------------------
>>>A daemon (pid 7665) died unexpectedly with status 1 while attempting
>>>to launch so we are aborting.
>>>
>>>There may be more information reported by the environment (see
>>>above).
>>>
>>>This may be because the daemon was unable to find all the needed
>>>shared
>>>libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>have the
>>>location of the shared libraries on the remote nodes and this will
>>>automatically be forwarded to the remote nodes.
>>>--------------------------------------------------------------------------
>>>--------------------------------------------------------------------------
>>>mpiexec noticed that the job aborted, but has no info as to the
>>>process
>>>that caused that situation.
>>>--------------------------------------------------------------------------
>>>mpiexec: clean termination accomplished
>>>
>>>[allan_at_x1 ~]$
>>>However my main eth0 IP is 192.168.1.1 and internet gate way is
>>>192.168.0.1
>>>Any solutions?
>>>Allan Menezes
>>>
>>>
>>>
>>>
>>>
>>>
>>>>>Hi,
>>>>>I built open mpi version 1.3b1 withe following cofigure command:
>>>>>./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads --with-
>>>>>threads=posix --disable-ipv6
>>>>>I have six nodes x1..6
>>>>>I distributed the /opt/openmpi13b1 with scp to all other nodes from
>>>>>the head node
>>>>>When i run the following command:
>>>>>mpirun --prefix /opt/openmpi13b1 --host x1 hostname it works on x1
>>>>>printing out the hostname of x1
>>>>>But when i type
>>>>>mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and
>>>>>does not give me any output
>>>>>I have a 6 node intel quad core cluster with OSCAR and pci express
>>>>>gigabit ethernet for eth0
>>>>>Can somebody advise?
>>>>>Thank you very much.
>>>>>Allan Menezes
>>>>>_______________________________________________
>>>>>users mailing list
>>>>>users_at_[hidden]
>>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>>
>>Hi Ralph,
>> It works for openmpi version 1.28 why should it not work for
>>version 1.3?
>>Yes I can ssh to x2 from x1 and x1 from x2.
>>Here if the ifconfig -a for x1:
>>eth0 Link encap:Ethernet HWaddr 00:1B:21:02:89:DA
>>inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::21b:21ff:fe02:89da/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:44906 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:77644 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:3309896 (3.1 MiB) TX bytes:101134505 (96.4 MiB)
>> Memory:feae0000-feb00000
>>
>>eth1 Link encap:Ethernet HWaddr 00:0E:0C:BC:AB:6D
>>inet addr:192.168.3.1 Bcast:192.168.3.255 Mask:255.255.255.0
>> inet6 addr: fe80::20e:cff:febc:ab6d/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:124 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:133 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:7440 (7.2 KiB) TX bytes:10027 (9.7 KiB)
>>
>>eth2 Link encap:Ethernet HWaddr 00:1B:FC:A0:A7:92
>>inet addr:192.168.7.1 Bcast:192.168.7.255 Mask:255.255.255.0
>> inet6 addr: fe80::21b:fcff:fea0:a792/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:159 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:158 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:10902 (10.6 KiB) TX bytes:13691 (13.3 KiB)
>> Interrupt:17
>>
>>eth4 Link encap:Ethernet HWaddr 00:0E:0C:B9:50:A3
>>inet addr:192.168.0.198 Bcast:192.168.0.255 Mask:255.255.255.0
>> inet6 addr: fe80::20e:cff:feb9:50a3/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:25111 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:11633 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:24133775 (23.0 MiB) TX bytes:833868 (814.3 KiB)
>>
>>lo Link encap:Local Loopback inet addr:127.0.0.1
>>Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:28973 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:28973 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:1223211 (1.1 MiB) TX bytes:1223211 (1.1 MiB)
>>
>>pan0 Link encap:Ethernet HWaddr CA:00:CE:02:90:90
>>BROADCAST MULTICAST MTU:1500 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>>sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>>virbr0 Link encap:Ethernet HWaddr EA:6D:E7:85:8D:E7
>>inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
>> inet6 addr: fe80::e86d:e7ff:fe85:8de7/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:5083 (4.9 KiB)
>>
>>Here is the ifconfig -a for x2:
>>eth0 Link encap:Ethernet HWaddr 00:1B:21:02:DE:E9
>>inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::21b:21ff:fe02:dee9/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:565 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:565 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:181079 (176.8 KiB) TX bytes:106650 (104.1 KiB)
>> Memory:feae0000-feb00000
>>
>>eth1 Link encap:Ethernet HWaddr 00:0E:0C:BC:B1:7D
>>inet addr:192.168.3.2 Bcast:192.168.3.255 Mask:255.255.255.0
>> inet6 addr: fe80::20e:cff:febc:b17d/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:11 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:660 (660.0 b) TX bytes:1136 (1.1 KiB)
>>
>>eth2 Link encap:Ethernet HWaddr 00:1F:C6:27:1C:79
>>inet addr:192.168.7.2 Bcast:192.168.7.255 Mask:255.255.255.0
>> inet6 addr: fe80::21f:c6ff:fe27:1c79/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:11 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:506 (506.0 b) TX bytes:1094 (1.0 KiB)
>> Interrupt:17
>>
>>lo Link encap:Local Loopback inet addr:127.0.0.1
>>Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:1604 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:1604 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:140216 (136.9 KiB) TX bytes:140216 (136.9 KiB)
>>
>>sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>>Any help would be appreciated!
>>Allan Menezes
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
>
>------------------------------
>
>Message: 2
>Date: Fri, 31 Oct 2008 20:33:33 -0400
>From: Mi Yan <miyan_at_[hidden]>
>Subject: Re: [OMPI users] problem running Open MPI on Cells
>To: Open MPI Users <users_at_[hidden]>
>Cc: Open MPI Users <users_at_[hidden]>, users-bounces_at_[hidden]
>Message-ID:
> <OFD6258791.6AD0754A-ON852574F4.0001B465-852574F4.00030FF6_at_[hidden]>
>Content-Type: text/plain; charset="us-ascii"
>
>
>Where did you put the environment variable related to MCF licence file and
>MCF share libraries?
>What is your default shell?
>
>Did you test indicate the following?
>Suppose you have 4 nodes,
>on node 1, " mpirun -np 4 --host node1,node2,node3,node4 hostname" works,
>but "mpirun -np4 --host node1,node2,node3,node4 foocbe" does not work,
>where foocbe is executable generated with MCF.
>
> It is possible that MCF license is limited to a few concurrent use? e.g.
>the license is limited to 4 current use, and mpi application will fails
>on 8 nodes?
>
>Regards,
>Mi
>
>
>
> Hahn Kim
> <hgk_at_[hidden]>
> Sent by: To
> users-bounces_at_ope Open MPI Users <users_at_[hidden]>
> n-mpi.org cc
>
> Subject
> 10/31/2008 03:38 [OMPI users] problem running Open
> PM MPI on Cells
>
>
> Please respond to
> Open MPI Users
> <users_at_open-mpi.o
> rg>
>
>
>
>
>
>
>Hello,
>
>I'm having problems using Open MPI on a cluster of Mercury Computer's
>Cell Accelerator Boards (CABs).
>
>We have an MPI application that is running on multiple CABs. The
>application uses Mercury's MultiCore Framework (MCF) to use the Cell's
>SPEs. Here's the basic problem. I can log into each CAB and run the
>application in serial directly from the command line (i.e. without
>using mpirun) without a problem. I can also launch a serial job onto
>each CAB from another machine using mpirun without a problem.
>
>The problem occurs when I try to launch onto multiple CABs using
>mpirun. MCF requires a license file. After the application
>initializes MPI, it tries to initialized MCF on each node. The
>initialization routine loads the MCF license file and checks for valid
>license keys. If the keys are valid, then it continues to initialize
>MCF. If not, it throws an error.
>
>When I run on multiple CABs, most of the time several of the CABs
>throw an error saying MCF cannot find a valid license key. The
>strange this is that this behavior doesn't appear when I launch serial
>jobs using MCF, only multiple CABs. Additionally, the errors are
>inconsistent. Not all the CABs throw an error, sometimes a few of
>them error out, sometimes all of them, sometimes none.
>
>I've talked with the Mercury folks and they're just as stumped as I
>am. The only thing we can think of is that OpenMPI is somehow
>modifying the environment and is interfering with MCF, but we can't
>think of any reason why.
>
>Any ideas out there? Thanks.
>
>Hahn
>
>--
>Hahn Kim, hgk_at_[hidden]
>MIT Lincoln Laboratory
>244 Wood St., Lexington, MA 02420
>Tel: 781-981-0940, Fax: 781-981-5255
>
>
>
>
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>-------------- next part --------------
>HTML attachment scrubbed and removed
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: graycol.gif
>Type: image/gif
>Size: 105 bytes
>Desc: not available
>URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment.gif>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: pic18585.gif
>Type: image/gif
>Size: 1255 bytes
>Desc: not available
>URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0001.gif>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: ecblank.gif
>Type: image/gif
>Size: 45 bytes
>Desc: not available
>URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0002.gif>
>
>------------------------------
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>End of users Digest, Vol 1052, Issue 10
>***************************************
>
>
>