Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem for multiple clusters using mpirun
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-03-21 08:52:23


Looks like you don't have an IB connection between "master" and "node001"

On Mar 21, 2014, at 12:43 AM, Hamid Saeed <e.hamidsaeed_at_[hidden]> wrote:

> Hello All:
>
> I know there will be some one who can help me in solving this problem.
>
> I can compile my helloworld.c program using mpicc and I have confirmed that the script runs correctly on another working cluster, so the local paths are set up correctly I think and the script definitely works.
>
> If I execute mpirun from my master node, and using only the master node, helloworld executes correctly:
>
> mpirun -n 1 -host master --mca btl sm,openib,self ./helloworldmpi
> hello world from process 0 of 1
> If I execute mpirun from my master node, using only the worker node, helloworld executes correctly:
>
> mpirun -n 1 -host node001 --mca btl sm,openib,self./helloworldmpi
> hello world from process 0 of 1
> Now, my problem is that if I try to run helloworld on both nodes, I get an error:
>
> mpirun -n 2 -host master,node001 --mca btl openib,self ./helloworldmpi
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications. This means that no Open MPI device has indicated
> that it can be used to communicate between these processes. This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other. This error can sometimes be the result of forgetting to
> specify the "self" BTL.
>
> Process 1 ([[5228,1],0]) is on host: hsaeed
> Process 2 ([[5228,1],1]) is on host: node001
> BTLs attempted: self
>
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 7037 on
> node xxxx exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> 1 more process has sent help message help-mca-bml-r2.txt / unreachable proc
> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> 1 more process has sent help message help-mpi-runtime
>
> i tried using
> mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> mpirun -n 2 -host master,node001 --mca btl o
> penib,tcp,self ./helloworldmpi
> etc..
>
> But no flag is works.
>
>
> Can some one reply with the idea.
>
> Thanks in Advance.
>
> Regards--
> --
> _______________________________________________
> Hamid Saeed
> _______________________________________________
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users