Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Fwd: problem for multiple clusters using mpirun
From: Hamid Saeed (e.hamidsaeed_at_[hidden])
Date: 2014-03-21 10:09:11


---------- Forwarded message ----------
From: Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
Date: Fri, Mar 21, 2014 at 3:05 PM
Subject: Re: problem for multiple clusters using mpirun
To: Hamid Saeed <e.hamidsaeed_at_[hidden]>

Please reply on the mailing list; more people can reply that way, and the
answers to your questions become google-able for people with similar
questions.

On Mar 21, 2014, at 10:03 AM, Hamid Saeed <e.hamidsaeed_at_[hidden]> wrote:

> Hello Jeff,
>
> Sorry to bother you again.
>
> I think i have a tcp connection. As for as i know my cluster is not
configured for Infiniband (IB).
>
> but even for tcp connections.
>
> mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> mpirun -n 2 -host master,node001 ./helloworldmpi
>
> These line are not working they output
> Error like
> [btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect()
to xx.xxx.x.xxx failed: Connection refused (111)
>
>
> at the program hangs up until i press
> ctrl + c.
> n Fri, Mar 21, 2014 at 2:47 PM, Hamid Saeed <e.hamidsaeed_at_[hidden]>
wrote:
>
> Hello,
>
> Thanks for the answer.
>
> Can you kindly explain what does IB connection means?
>
> thanks
>
> regards
>
>
>
> On Fri, Mar 21, 2014 at 2:44 PM, Jeff Squyres (jsquyres) <
jsquyres_at_[hidden]> wrote:
> Was Ralph's answer not enough? I think he hit the nail on the head...
>
>
> On Mar 21, 2014, at 9:29 AM, Hamid Saeed <e.hamidsaeed_at_[hidden]> wrote:
>
> > Hello:
> >
> > I have learnt about mpi from you using different web portals.
> > I hope you can help me in solving this problem too.
> >
> > * I can compile my helloworld.c program using mpicc and I have
confirmed that the script runs correctly on another working cluster, so the
local paths are set up correctly I think and the script definitely works.
> >
> > * If I execute mpirun from my master node, and using only the
master node, helloworld executes correctly:
> >
> > mpirun -n 1 -host master --mca btl sm,openib,self ./helloworldmpi
> > hello world from process 0 of 1
> >
> > * If I execute mpirun from my master node, using only the worker
node, helloworld executes correctly:
> >
> > mpirun -n 1 -host node001 --mca btl sm,openib,self./helloworldmpi
> > hello world from process 0 of 1
> >
> > Now, my problem is that if I try to run helloworld on both nodes, I get
an error:
> >
> > mpirun -n 2 -host master,node001 --mca btl openib,self ./helloworldmpi
> >
--------------------------------------------------------------------------
> > At least one pair of MPI processes are unable to reach each other for
> > MPI communications. This means that no Open MPI device has indicated
> > that it can be used to communicate between these processes. This is
> > an error; Open MPI requires that all MPI processes be able to reach
> > each other. This error can sometimes be the result of forgetting to
> > specify the "self" BTL.
> >
> > Process 1 ([[5228,1],0]) is on host: hsaeed
> > Process 2 ([[5228,1],1]) is on host: node001
> > BTLs attempted: self
> >
> > Your MPI job is now going to abort; sorry.
> >
--------------------------------------------------------------------------
> >
--------------------------------------------------------------------------
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
environment
> > problems. This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >
> > PML add procs failed
> > --> Returned "Unreachable" (-12) instead of "Success" (0)
> >
--------------------------------------------------------------------------
> > *** The MPI_Init() function was called before MPI_INIT was invoked.
> > *** This is disallowed by the MPI standard.
> > *** Your MPI job will now abort.
> > Abort before MPI_INIT completed successfully; not able to guarantee
that all other processes were killed!
> >
--------------------------------------------------------------------------
> > mpirun has exited due to process rank 0 with PID 7037 on
> > node xxxx exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
--------------------------------------------------------------------------
> > *** The MPI_Init() function was called before MPI_INIT was invoked.
> > *** This is disallowed by the MPI standard.
> > *** Your MPI job will now abort.
> > Abort before MPI_INIT completed successfully; not able to guarantee
that all other processes were killed!
> > 1 more process has sent help message help-mca-bml-r2.txt / unreachable
proc
> > Set MCA parameter "orte_base_help_aggregate" to 0 to see all help /
error messages
> > 1 more process has sent help message help-mpi-runtime
> >
> >
> > i tried using
> > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> > mpirun -n 2 -host master,node001 --mca btl o
> >
> >
> > penib,tcp,
> > self ./helloworldmpi
> > etc..
> >
> > But no flag is works.
> >
> >
> > Can some one reply with the idea.
> >
> > Thanks in Advance.
> >
> > Regards--
> > --
> > _______________________________________________
> > Hamid Saeed
> > _______________________________________________
> >
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
>
> --
>

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
-- 
_______________________________________________
Hamid Saeed
CoSynth GmbH & Co. KG
Escherweg 2 - 26121 Oldenburg - Germany
Tel +49 441 9722 738 | Fax -278
http://www.cosynth.com
_______________________________________________