Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fwd: problem for multiple clusters using mpirun
From: Hamid Saeed (e.hamidsaeed_at_[hidden])
Date: 2014-03-21 10:24:35


/sbin/ifconfig

hsaeed_at_karp:~$ /sbin/ifconfig
br0 Link encap:Ethernet HWaddr 00:25:90:59:c9:ba
          inet addr:134.106.3.231 Bcast:134.106.3.255 Mask:255.255.255.0
          inet6 addr: fe80::225:90ff:fe59:c9ba/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:49080961 errors:0 dropped:50263 overruns:0 frame:0
          TX packets:43279252 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:41348407558 (38.5 GiB) TX bytes:80505842745 (74.9 GiB)

br1 Link encap:Ethernet HWaddr 00:25:90:59:c9:bb
          inet addr:134.106.53.231 Bcast:134.106.53.255 Mask:255.255.255.0
          inet6 addr: fe80::225:90ff:fe59:c9bb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:41573060 errors:0 dropped:50261 overruns:0 frame:0
          TX packets:1693509 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6177072160 (5.7 GiB) TX bytes:230617435 (219.9 MiB)

br2 Link encap:Ethernet HWaddr 00:c0:0a:ec:02:e7
          inet addr:10.231.2.231 Bcast:10.231.2.255 Mask:255.255.255.0
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 00:25:90:59:c9:ba
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:69108377 errors:0 dropped:0 overruns:0 frame:0
          TX packets:86459066 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:43533091399 (40.5 GiB) TX bytes:83359370885 (77.6 GiB)
          Memory:dfe60000-dfe80000

eth1 Link encap:Ethernet HWaddr 00:25:90:59:c9:bb
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:43531546 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1716151 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7201915977 (6.7 GiB) TX bytes:232026383 (221.2 MiB)
          Memory:dfee0000-dff00000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:10890707 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10890707 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:36194379576 (33.7 GiB) TX bytes:36194379576 (33.7 GiB)

tap0 Link encap:Ethernet HWaddr 00:c0:0a:ec:02:e7
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

When i execute the following line

hsaeed_at_karp:~/Task4_mpi/scatterv$ mpiexec -n 2 -host wirth,karp ./a.out

i receive Error

[wirth][[59430,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 10.231.2.231 failed: Connection refused (111)

NOTE: Karp and wirth are two machines on ssh cluster.

On Fri, Mar 21, 2014 at 3:13 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]
> wrote:

> On Mar 21, 2014, at 10:09 AM, Hamid Saeed <e.hamidsaeed_at_[hidden]> wrote:
>
> > > I think i have a tcp connection. As for as i know my cluster is not
> configured for Infiniband (IB).
>
> Ok.
>
> > > but even for tcp connections.
> > >
> > > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> > > mpirun -n 2 -host master,node001 ./helloworldmpi
> > >
> > > These line are not working they output
> > > Error like
> > > [btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
> connect() to xx.xxx.x.xxx failed: Connection refused (111)
>
> What are the IP addresses reported by connect()? (i.e., the address you
> X'ed out)
>
> Send the output from ifconfig on each of your servers. Note that some
> Linux distributions do not put ifconfig in the default PATH of normal
> users; look for it in/sbin/ifconfig or /usr/sbin/ifconfig.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
_______________________________________________
Hamid Saeed
CoSynth GmbH & Co. KG
Escherweg 2 - 26121 Oldenburg - Germany
Tel +49 441 9722 738 | Fax -278
http://www.cosynth.com
_______________________________________________