Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OPEN MPI error
From: Gus Correa (gus_at_[hidden])
Date: 2013-09-18 17:48:45

Hi justa tester tester

Is your p2p1 interface an Infiniband port, or is it Ethernet?
If it is Ethernet, try removing "--mca btl_openib_if_include p2p1"
from your mpiexec command line, because it would conflict with
the other mca parameter you chose "--mca btl openib,sm,self".

Simpler may be better: Have you tried to use just
"--mca btl openib,sm,self" ?
This way OMPI will find the Infiniband interface(s) for you.

Justa guessed guess,
Gus Correa

On 09/18/2013 01:49 PM, justa tester tester wrote:
> I'm new to OPEN MPI and have a question in regards to the error I'm
> seeing after compiling the OFED stack to facilitate RDMA and OpenMPI and
> specified the install path of OFED stack and installed Intel MPI
> Benchmark. I was able to run tcp but when running openib we could not
> run succesfully we are see the error below: OFED version 3.5
> [root_at_dhcp-8-168 imb]# mpirun --mca btl openib,sm,self --mca
> btl_openib_cpc_include rdmacm --mca btl_openib_if_include p2p1 --mca
> btl_openib_verbose 2 -np 2 -hostfile hosts ./3.2.4/src/IMB-MPI1 -npmin 2
> -iter 10 PingPong
> --------------------------------------------------------------------------
> WARNING: One or more nonexistent OpenFabrics devices/ports were
> specified:
> Host: dhcp-8-168
> MCA parameter: mca_btl_if_include
> Nonexistent entities: p2p1
> These entities will be ignored. You can disable this warning by
> setting the btl_openib_warn_nonexistent_if MCA parameter to 0.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications. This means that no Open MPI device has indicated
> that it can be used to communicate between these processes. This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other. This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> Process 1 ([[60771,1],0]) is on host: dhcp-8-168
> Process 2 ([[60771,1],1]) is on host: 169
> BTLs attempted: self sm
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> MPI_INIT has failed because at least one MPI process is unreachable
> from another. This *usually* means that an underlying communication
> plugin -- such as a BTL or an MTL -- has either not loaded or not
> allowed itself to be used. Your MPI job will now abort.
> You may wish to try to narrow down the problem;
> * Check the output of ompi_info to see which BTL/MTL plugins are
> available.
> * Run your application with MPI_THREAD_SINGLE.
> * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
> if using MTL-based communications) to see exactly which
> communication plugins were considered and/or discarded.
> --------------------------------------------------------------------------
> [dhcp-8-168:3503] *** An error occurred in MPI_Init
> [dhcp-8-168:3503] *** on a NULL communicator
> [dhcp-8-168:3503] *** Unknown error
> [dhcp-8-168:3503] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly. You should
> double check that everything has shut down cleanly.
> Reason: Before MPI_INIT completed
> Local host: dhcp-8-168
> PID: 3503
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 3503 on
> node dhcp-8-168 exiting improperly. There are two reasons this could occur:
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [dhcp-8-168:03501] 1 more process has sent help message
> help-mpi-btl-openib.txt / nonexistent port
> [dhcp-8-168:03501] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
> [dhcp-8-168:03501] 1 more process has sent help message
> help-mca-bml-r2.txt / unreachable proc
> [dhcp-8-168:03501] 1 more process has sent help message help-mpi-runtime
> / mpi_init:startup:pml-add-procs-fail
> [dhcp-8-168:03501] 1 more process has sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
> [dhcp-8-168:03501] 1 more process has sent help message
> help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed
> --Tester
> _______________________________________________
> users mailing list
> users_at_[hidden]