Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Trouble with PSM "Could not detect network connectivity"
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-11-02 12:03:02


Try configuring --without-psm

That should solve the problem. We are probably picking up that you have PSM libraries on the machine, but it looks like you aren't actually running it.

And yes - it should gracefully disable itself. You might check the 1.6 series to see if it behaves better - if not, we should fix it.

On Nov 2, 2012, at 8:49 AM, "Blosch, Edwin L" <edwin.l.blosch_at_[hidden]> wrote:

> I am getting a problem where something called "PSM" is failing to start and that in turn is preventing my job from running. Command and output are below. I would like to understand what's going on. Apparently this version of OpenMPI decided to build itself with support for PSM, but if it's not available, why fail if other transports are available? Also, in my command I think I've told OpenMPI not to use anything but self and sm, so why would it try to use PSM?
>
> Thanks in advance for any help...
>
> user_at_machinename:~> /usr/mpi/intel/openmpi-1.4.3/bin/ompi_info -all | grep psm
> MCA mtl: psm (MCA v2.0, API v2.0, Component v1.4.3)
> MCA mtl: parameter "mtl_psm_connect_timeout" (current value: "180", data source: default value)
> MCA mtl: parameter "mtl_psm_debug" (current value: "1", data source: default value)
> MCA mtl: parameter "mtl_psm_ib_unit" (current value: "-1", data source: default value)
> MCA mtl: parameter "mtl_psm_ib_port" (current value: "0", data source: default value)
> MCA mtl: parameter "mtl_psm_ib_service_level" (current value: "0", data source: default value)
> MCA mtl: parameter "mtl_psm_ib_pkey" (current value: "32767", data source: default value)
> MCA mtl: parameter "mtl_psm_priority" (current value: "0", data source: default value)
>
> Here is my command:
>
> /usr/mpi/intel/openmpi-1.4.3/bin/mpirun -n 1 --mca btl_base_verbose 30 --mca btl self,sm /release/cfd/simgrid/P_OPT.LINUX64
>
> and here is the output:
>
> [machinename:01124] mca: base: components_open: Looking for btl components
> [machinename:01124] mca: base: components_open: opening btl components
> [machinename:01124] mca: base: components_open: found loaded component self
> [machinename:01124] mca: base: components_open: component self has no register function
> [machinename:01124] mca: base: components_open: component self open function successful
> [machinename:01124] mca: base: components_open: found loaded component sm
> [machinename:01124] mca: base: components_open: component sm has no register function
> [machinename:01124] mca: base: components_open: component sm open function successful
> machinename.1124ipath_userinit: assign_context command failed: Network is down
> machinename.1124can't open /dev/ipath, network down (err=26)
> --------------------------------------------------------------------------
> PSM was unable to open an endpoint. Please make sure that the network link is
> active on the node and the hardware is functioning.
>
> Error: Could not detect network connectivity
> --------------------------------------------------------------------------
> [machinename:01124] mca: base: close: component self closed
> [machinename:01124] mca: base: close: unloading component self
> [machinename:01124] mca: base: close: component sm closed
> [machinename:01124] mca: base: close: unloading component sm
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users