Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 10
From: Allan Menezes (amenezes007_at_[hidden])
Date: 2008-10-31 20:47:57


users-request_at_[hidden] wrote:

>Send users mailing list submissions to
> users_at_[hidden]
>
>To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>or, via email, send a message with subject or body 'help' to
> users-request_at_[hidden]
>
>You can reach the person managing the list at
> users-owner_at_[hidden]
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of users digest..."
>
>
>Today's Topics:
>
> 1. Re: Problem with openmpi version 1.3b1 beta1 (Ralph Castain)
> 2. Re: problem running Open MPI on Cells (Mi Yan)
>
>
>
Hi Ralph,
    I solved the problem. With Ver 1.28 same system this configure works
./configure --prefix=/opt/openmpi128 --enable-mpi-threads
--with-threads=posix --disable-ipv6 but does not work with any ver 1.3
beacuse of ipv6 so to fix it i rebuilt it after make clean with ipv6
enabled and it works!
This configure for ver 1.3 works on my system
./configure --prefix=/opt/openmpi128 --enable-mpi-threads
--with-threads=posix
Do you still want the old or the new config.log?
Allan Menezes

>----------------------------------------------------------------------
>
>Message: 1
>Date: Fri, 31 Oct 2008 17:02:15 -0600
>From: Ralph Castain <rhc_at_[hidden]>
>Subject: Re: [OMPI users] Problem with openmpi version 1.3b1 beta1
>To: Open MPI Users <users_at_[hidden]>
>Message-ID: <6BCB1362-EA2C-4B4B-AB1A-367ED7739783_at_[hidden]>
>Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
>I see you are using IPv6. From what I can tell, we do enable that
>support by default if the underlying system supports it.
>
>My best guess is that either that support is broken (we never test it
>since none of us use IPv6), or our configure system isn't properly
>detecting that it exists.
>
>Can you attach a copy of your config.log? It will tell us what the
>system thinks it should be building.
>
>Thanks
>Ralph
>
>On Oct 31, 2008, at 4:54 PM, Allan Menezes wrote:
>
>
>
>>Date: Fri, 31 Oct 2008 09:34:52 -0600
>>From: Ralph Castain <rhc_at_[hidden]>
>>Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 1
>>To: Open MPI Users <users_at_[hidden]>
>>Message-ID: <0CF28492-B13E-4F82-AC43-C1580F0794D1_at_[hidden]>
>>Content-Type: text/plain; charset="us-ascii"; Format="flowed";
>> DelSp="yes"
>>
>>It looks like the daemon isn't seeing the other interface address
>>on host x2. Can you ssh to x2 and send the contents of ifconfig -a?
>>
>>Ralph
>>
>>On Oct 31, 2008, at 9:18 AM, Allan Menezes wrote:
>>
>>
>>
>>
>>>users-request_at_[hidden] wrote:
>>>
>>>
>>>
>>>>Send users mailing list submissions to
>>>> users_at_[hidden]
>>>>
>>>>To subscribe or unsubscribe via the World Wide Web, visit
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>or, via email, send a message with subject or body 'help' to
>>>> users-request_at_[hidden]
>>>>
>>>>You can reach the person managing the list at
>>>> users-owner_at_[hidden]
>>>>
>>>>When replying, please edit your Subject line so it is more specific
>>>>than "Re: Contents of users digest..."
>>>>
>>>>
>>>>Today's Topics:
>>>>
>>>> 1. Openmpi ver1.3beta1 (Allan Menezes)
>>>> 2. Re: Openmpi ver1.3beta1 (Ralph Castain)
>>>> 3. Re: Equivalent .h files (Benjamin Lamptey)
>>>> 4. Re: Equivalent .h files (Jeff Squyres)
>>>> 5. ompi-checkpoint is hanging (Matthias Hovestadt)
>>>> 6. unsubscibe (Bertrand P. S. Russell)
>>>> 7. Re: ompi-checkpoint is hanging (Tim Mattox)
>>>>
>>>>
>>>>----------------------------------------------------------------------
>>>>
>>>>Message: 1
>>>>Date: Fri, 31 Oct 2008 02:06:09 -0400
>>>>From: Allan Menezes <amenezes007_at_[hidden]>
>>>>Subject: [OMPI users] Openmpi ver1.3beta1
>>>>To: users_at_[hidden]
>>>>Message-ID: <BLU0-SMTP224B5E356302AC7AA4481088200_at_phx.gbl>
>>>>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>>Hi,
>>>> I built open mpi version 1.3b1 withe following cofigure command:
>>>>./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads
>>>>--with-threads=posix --disable-ipv6
>>>>I have six nodes x1..6
>>>>I distributed the /opt/openmpi13b1 with scp to all other nodes
>>>>from the
>>>>head node
>>>>When i run the following command:
>>>>mpirun --prefix /opt/openmpi13b1 --host x1 hostname it works on x1
>>>>printing out the hostname of x1
>>>>But when i type
>>>>mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and
>>>>does
>>>>not give me any output
>>>>I have a 6 node intel quad core cluster with OSCAR and pci express
>>>>gigabit ethernet for eth0
>>>>Can somebody advise?
>>>>Thank you very much.
>>>>Allan Menezes
>>>>
>>>>
>>>>------------------------------
>>>>
>>>>Message: 2
>>>>Date: Fri, 31 Oct 2008 02:41:59 -0600
>>>>From: Ralph Castain <rhc_at_[hidden]>
>>>>Subject: Re: [OMPI users] Openmpi ver1.3beta1
>>>>To: Open MPI Users <users_at_[hidden]>
>>>>Message-ID: <E8AF5AAF-99CB-4EFC-AA97-5385CE333AD2_at_[hidden]>
>>>>Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>>>>
>>>>When you typed the --host x1 command, were you sitting on x1?
>>>>Likewise, when you typed the --host x2 command, were you not on
>>>>host x2?
>>>>
>>>>If the answer to both questions is "yes", then my guess is that
>>>>something is preventing you from launching a daemon on host x2. Try
>>>>adding --leave-session-attached to your cmd line and see if any
>>>>error
>>>>messages appear. And check the FAQ for tips on how to setup for ssh
>>>>launch (I'm assuming that is what you are using).
>>>>
>>>>http://www.open-mpi.org/faq/?category=rsh
>>>>
>>>>Ralph
>>>>
>>>>On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Hi Ralph,
>>> Yes that is true I tried both commands on x1 and ver 1.28 works
>>>on the same setup without a problem.
>>>Here is the output with the added
>>>--leave-session-attached
>>>[allan_at_x1 ~]$ mpiexec --prefix /opt/openmpi13b2 --leave-session-
>>>attached -host x2 hostname
>>>[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]
>>>mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed:
>>>Network is unreachable (101)
>>>[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]
>>>mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed:
>>>Network is unreachable (101)
>>>[x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection
>>>to lifeline [[1354,0],0] lost
>>>--------------------------------------------------------------------------
>>>A daemon (pid 7665) died unexpectedly with status 1 while attempting
>>>to launch so we are aborting.
>>>
>>>There may be more information reported by the environment (see
>>>above).
>>>
>>>This may be because the daemon was unable to find all the needed
>>>shared
>>>libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>have the
>>>location of the shared libraries on the remote nodes and this will
>>>automatically be forwarded to the remote nodes.
>>>--------------------------------------------------------------------------
>>>--------------------------------------------------------------------------
>>>mpiexec noticed that the job aborted, but has no info as to the
>>>process
>>>that caused that situation.
>>>--------------------------------------------------------------------------
>>>mpiexec: clean termination accomplished
>>>
>>>[allan_at_x1 ~]$
>>>However my main eth0 IP is 192.168.1.1 and internet gate way is
>>>192.168.0.1
>>>Any solutions?
>>>Allan Menezes
>>>
>>>
>>>
>>>
>>>
>>>
>>>>>Hi,
>>>>>I built open mpi version 1.3b1 withe following cofigure command:
>>>>>./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads --with-
>>>>>threads=posix --disable-ipv6
>>>>>I have six nodes x1..6
>>>>>I distributed the /opt/openmpi13b1 with scp to all other nodes from
>>>>>the head node
>>>>>When i run the following command:
>>>>>mpirun --prefix /opt/openmpi13b1 --host x1 hostname it works on x1
>>>>>printing out the hostname of x1
>>>>>But when i type
>>>>>mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and
>>>>>does not give me any output
>>>>>I have a 6 node intel quad core cluster with OSCAR and pci express
>>>>>gigabit ethernet for eth0
>>>>>Can somebody advise?
>>>>>Thank you very much.
>>>>>Allan Menezes
>>>>>_______________________________________________
>>>>>users mailing list
>>>>>users_at_[hidden]
>>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>>
>>Hi Ralph,
>> It works for openmpi version 1.28 why should it not work for
>>version 1.3?
>>Yes I can ssh to x2 from x1 and x1 from x2.
>>Here if the ifconfig -a for x1:
>>eth0 Link encap:Ethernet HWaddr 00:1B:21:02:89:DA
>>inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::21b:21ff:fe02:89da/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:44906 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:77644 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:3309896 (3.1 MiB) TX bytes:101134505 (96.4 MiB)
>> Memory:feae0000-feb00000
>>
>>eth1 Link encap:Ethernet HWaddr 00:0E:0C:BC:AB:6D
>>inet addr:192.168.3.1 Bcast:192.168.3.255 Mask:255.255.255.0
>> inet6 addr: fe80::20e:cff:febc:ab6d/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:124 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:133 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:7440 (7.2 KiB) TX bytes:10027 (9.7 KiB)
>>
>>eth2 Link encap:Ethernet HWaddr 00:1B:FC:A0:A7:92
>>inet addr:192.168.7.1 Bcast:192.168.7.255 Mask:255.255.255.0
>> inet6 addr: fe80::21b:fcff:fea0:a792/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:159 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:158 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:10902 (10.6 KiB) TX bytes:13691 (13.3 KiB)
>> Interrupt:17
>>
>>eth4 Link encap:Ethernet HWaddr 00:0E:0C:B9:50:A3
>>inet addr:192.168.0.198 Bcast:192.168.0.255 Mask:255.255.255.0
>> inet6 addr: fe80::20e:cff:feb9:50a3/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:25111 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:11633 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:24133775 (23.0 MiB) TX bytes:833868 (814.3 KiB)
>>
>>lo Link encap:Local Loopback inet addr:127.0.0.1
>>Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:28973 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:28973 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:1223211 (1.1 MiB) TX bytes:1223211 (1.1 MiB)
>>
>>pan0 Link encap:Ethernet HWaddr CA:00:CE:02:90:90
>>BROADCAST MULTICAST MTU:1500 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>>sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>>virbr0 Link encap:Ethernet HWaddr EA:6D:E7:85:8D:E7
>>inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
>> inet6 addr: fe80::e86d:e7ff:fe85:8de7/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:5083 (4.9 KiB)
>>
>>Here is the ifconfig -a for x2:
>>eth0 Link encap:Ethernet HWaddr 00:1B:21:02:DE:E9
>>inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::21b:21ff:fe02:dee9/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:565 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:565 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:181079 (176.8 KiB) TX bytes:106650 (104.1 KiB)
>> Memory:feae0000-feb00000
>>
>>eth1 Link encap:Ethernet HWaddr 00:0E:0C:BC:B1:7D
>>inet addr:192.168.3.2 Bcast:192.168.3.255 Mask:255.255.255.0
>> inet6 addr: fe80::20e:cff:febc:b17d/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:11 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:660 (660.0 b) TX bytes:1136 (1.1 KiB)
>>
>>eth2 Link encap:Ethernet HWaddr 00:1F:C6:27:1C:79
>>inet addr:192.168.7.2 Bcast:192.168.7.255 Mask:255.255.255.0
>> inet6 addr: fe80::21f:c6ff:fe27:1c79/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:11 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:506 (506.0 b) TX bytes:1094 (1.0 KiB)
>> Interrupt:17
>>
>>lo Link encap:Local Loopback inet addr:127.0.0.1
>>Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:1604 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:1604 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:140216 (136.9 KiB) TX bytes:140216 (136.9 KiB)
>>
>>sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>>Any help would be appreciated!
>>Allan Menezes
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
>
>------------------------------
>
>Message: 2
>Date: Fri, 31 Oct 2008 20:33:33 -0400
>From: Mi Yan <miyan_at_[hidden]>
>Subject: Re: [OMPI users] problem running Open MPI on Cells
>To: Open MPI Users <users_at_[hidden]>
>Cc: Open MPI Users <users_at_[hidden]>, users-bounces_at_[hidden]
>Message-ID:
> <OFD6258791.6AD0754A-ON852574F4.0001B465-852574F4.00030FF6_at_[hidden]>
>Content-Type: text/plain; charset="us-ascii"
>
>
>Where did you put the environment variable related to MCF licence file and
>MCF share libraries?
>What is your default shell?
>
>Did you test indicate the following?
>Suppose you have 4 nodes,
>on node 1, " mpirun -np 4 --host node1,node2,node3,node4 hostname" works,
>but "mpirun -np4 --host node1,node2,node3,node4 foocbe" does not work,
>where foocbe is executable generated with MCF.
>
> It is possible that MCF license is limited to a few concurrent use? e.g.
>the license is limited to 4 current use, and mpi application will fails
>on 8 nodes?
>
>Regards,
>Mi
>
>
>
> Hahn Kim
> <hgk_at_[hidden]>
> Sent by: To
> users-bounces_at_ope Open MPI Users <users_at_[hidden]>
> n-mpi.org cc
>
> Subject
> 10/31/2008 03:38 [OMPI users] problem running Open
> PM MPI on Cells
>
>
> Please respond to
> Open MPI Users
> <users_at_open-mpi.o
> rg>
>
>
>
>
>
>
>Hello,
>
>I'm having problems using Open MPI on a cluster of Mercury Computer's
>Cell Accelerator Boards (CABs).
>
>We have an MPI application that is running on multiple CABs. The
>application uses Mercury's MultiCore Framework (MCF) to use the Cell's
>SPEs. Here's the basic problem. I can log into each CAB and run the
>application in serial directly from the command line (i.e. without
>using mpirun) without a problem. I can also launch a serial job onto
>each CAB from another machine using mpirun without a problem.
>
>The problem occurs when I try to launch onto multiple CABs using
>mpirun. MCF requires a license file. After the application
>initializes MPI, it tries to initialized MCF on each node. The
>initialization routine loads the MCF license file and checks for valid
>license keys. If the keys are valid, then it continues to initialize
>MCF. If not, it throws an error.
>
>When I run on multiple CABs, most of the time several of the CABs
>throw an error saying MCF cannot find a valid license key. The
>strange this is that this behavior doesn't appear when I launch serial
>jobs using MCF, only multiple CABs. Additionally, the errors are
>inconsistent. Not all the CABs throw an error, sometimes a few of
>them error out, sometimes all of them, sometimes none.
>
>I've talked with the Mercury folks and they're just as stumped as I
>am. The only thing we can think of is that OpenMPI is somehow
>modifying the environment and is interfering with MCF, but we can't
>think of any reason why.
>
>Any ideas out there? Thanks.
>
>Hahn
>
>--
>Hahn Kim, hgk_at_[hidden]
>MIT Lincoln Laboratory
>244 Wood St., Lexington, MA 02420
>Tel: 781-981-0940, Fax: 781-981-5255
>
>
>
>
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>-------------- next part --------------
>HTML attachment scrubbed and removed
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: graycol.gif
>Type: image/gif
>Size: 105 bytes
>Desc: not available
>URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment.gif>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: pic18585.gif
>Type: image/gif
>Size: 1255 bytes
>Desc: not available
>URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0001.gif>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: ecblank.gif
>Type: image/gif
>Size: 45 bytes
>Desc: not available
>URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0002.gif>
>
>------------------------------
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>End of users Digest, Vol 1052, Issue 10
>***************************************
>
>
>