Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jon Mason (jon_at_[hidden])
Date: 2007-11-02 12:38:38


On Thu, Nov 01, 2007 at 07:41:33PM -0400, George Bosilca wrote:
> There are two things that are reflected in your email.
>
> 1. You can run Open MPI (or at least ompi_info) on the head node, and
> udapl is in the list of BTL. This means the head node has all
> libraries required to load udapl, and your LD_LIBRARY_PATH is
> correctly configured on the head node.
>
> 2. When running between vic12-10g and vic20-10g udapl cannot or refuse
> to be loaded. This can means 2 things: some of the shared libraries
> are missing or not in the LD_LIBRARY_PATH or once initialized udapl
> detect that the connection to the remote node is impossible.
>
> The next thing to do is to test that your LD_LIBRARY_PATH is correctly
> (which means it contain not only the path to the Open MPI libraries
> but the path to the udapl libraries) set for non-interactive shells on
> each node in the cluster. A "ssh vic12-10g printenv | grep
> LD_LIBRARY_PATH" should give you the answer.

Thanks for the help. Per your request, I get the following:
# ssh vic12-10g printenv | grep LD
LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-1.2-svn/lib64:

That directory contains the btl udapl libraries, as you said.
# ls -R /usr/mpi/gcc/openmpi-1.2-svn/lib64/ | grep dapl
mca_btl_udapl.la
mca_btl_udapl.so

A search on the system shows libdaplcma and libdat in /usr/lib/. For
giggles, I added /usr/lib to the env, but the programs still fails to
run with the same error.

I believe I have the correct rpms installed for the libs. Here is what
I have on the systems.
# rpm -qa | grep dapl
dapl-devel-1.2.1-0
dapl-1.2.1-0
dapl-utils-1.2.1-0

What should I be looking to link against?

Thanks,
Jon

>
> Thanks,
> georg.e
>
> On Nov 1, 2007, at 6:52 PM, Jon Mason wrote:
>
> >On Wed, Oct 31, 2007 at 06:45:10PM -0400, Tim Prins wrote:
> >>Hi Jon,
> >>
> >>Just to make sure, running 'ompi_info' shows that you have the
> >>udapl btl
> >>installed?
> >
> >Yes, I get the following:
> ># ompi_info | grep dapl
> > MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.5)
> >
> >If I do not include "self" in the mca, then I get an error saying it
> >cannot find the btl component:
> >
> ># mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl /usr/mpi/
> >gcc/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 pingpong
> >--------------------------------------------------------------------------
> >No available btl components were found!
> >
> >This means that there are no components of this type installed on your
> >system or all the components reported that they could not be used.
> >
> >This is a fatal error; your MPI process is likely to abort. Check the
> >output of the "ompi_info" command and ensure that components of this
> >type are available on your system. You may also wish to check the
> >value of the "component_path" MCA parameter and ensure that it has at
> >least one directory that contains valid MCA components.
> >
> >--------------------------------------------------------------------------
> >mpirun noticed that job rank 1 with PID 4335 on node vic20-10g
> >exited on
> >signal 15 (Terminated).
> >
> ># ompi_info --all | grep component_path
> > MCA mca: parameter "mca_component_path" (current
> >value: "/usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi:/root/.openmpi/
> >components")
> >
> ># ls /usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi | grep dapl
> >mca_btl_udapl.la
> >mca_btl_udapl.so
> >
> >So it looks to me like it should be finding it, but perhaps I am
> >lacking
> >something in my configuration. Any ideas?
> >
> >Thanks,
> >Jon
> >
> >
> >>
> >>Tim
> >>
> >>On Wednesday 31 October 2007 06:11:39 pm Jon Mason wrote:
> >>>I am having a bit of a problem getting udapl to work via mpirun
> >>>(over
> >>>open-mpi, obviously). I am running a basic pingpong test and I
> >>>get the
> >>>following error.
> >>>
> >>># mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl,self
> >>>/usr/mpi/gcc/open*/tests/IMB*/IMB-MPI1 pingpong
> >>>--------------------------------------------------------------------------
> >>>Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> >>>If you specified the use of a BTL component, you may have
> >>>forgotten a component (such as "self") in the list of
> >>>usable components.
> >>>--------------------------------------------------------------------------
> >>>--------------------------------------------------------------------------
> >>>It looks like MPI_INIT failed for some reason; your parallel
> >>>process is
> >>>likely to abort. There are many reasons that a parallel process can
> >>>fail during MPI_INIT; some of which are due to configuration or
> >>>environment
> >>>problems. This failure appears to be an internal failure; here's
> >>>some
> >>>additional information (which may only be relevant to an Open MPI
> >>>developer):
> >>>
> >>> PML add procs failed
> >>> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >>>--------------------------------------------------------------------------
> >>>*** An error occurred in MPI_Init
> >>>*** before MPI was initialized
> >>>*** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>--------------------------------------------------------------------------
> >>>Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> >>>If you specified the use of a BTL component, you may have
> >>>forgotten a component (such as "self") in the list of
> >>>usable components.
> >>>--------------------------------------------------------------------------
> >>>--------------------------------------------------------------------------
> >>>It looks like MPI_INIT failed for some reason; your parallel
> >>>process is
> >>>likely to abort. There are many reasons that a parallel process can
> >>>fail during MPI_INIT; some of which are due to configuration or
> >>>environment
> >>>problems. This failure appears to be an internal failure; here's
> >>>some
> >>>additional information (which may only be relevant to an Open MPI
> >>>developer):
> >>>
> >>> PML add procs failed
> >>> --> Returned "Unreachable" (-12) instead of "Success" (0)
> >>>--------------------------------------------------------------------------
> >>>*** An error occurred in MPI_Init
> >>>*** before MPI was initialized
> >>>*** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>
> >>>
> >>>
> >>>The command is successful if udapl is replaced with tcp or
> >>>openib. So I
> >>>think my setup is correct. Also, dapltest successfully completes
> >>>without any problems over IB or iWARP.
> >>>
> >>>Any thoughts or suggestions would be greatly appreciated.
> >>>
> >>>Thanks,
> >>>Jon
> >>>
> >>>_______________________________________________
> >>>users mailing list
> >>>users_at_[hidden]
> >>>http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >>_______________________________________________
> >>users mailing list
> >>users_at_[hidden]
> >>http://www.open-mpi.org/mailman/listinfo.cgi/users
> >_______________________________________________
> >users mailing list
> >users_at_[hidden]
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
>

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users