Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-11-01 19:41:33


There are two things that are reflected in your email.

1. You can run Open MPI (or at least ompi_info) on the head node, and
udapl is in the list of BTL. This means the head node has all
libraries required to load udapl, and your LD_LIBRARY_PATH is
correctly configured on the head node.

2. When running between vic12-10g and vic20-10g udapl cannot or refuse
to be loaded. This can means 2 things: some of the shared libraries
are missing or not in the LD_LIBRARY_PATH or once initialized udapl
detect that the connection to the remote node is impossible.

The next thing to do is to test that your LD_LIBRARY_PATH is correctly
(which means it contain not only the path to the Open MPI libraries
but the path to the udapl libraries) set for non-interactive shells on
each node in the cluster. A "ssh vic12-10g printenv | grep
LD_LIBRARY_PATH" should give you the answer.

   Thanks,
     georg.e

On Nov 1, 2007, at 6:52 PM, Jon Mason wrote:

> On Wed, Oct 31, 2007 at 06:45:10PM -0400, Tim Prins wrote:
>> Hi Jon,
>>
>> Just to make sure, running 'ompi_info' shows that you have the
>> udapl btl
>> installed?
>
> Yes, I get the following:
> # ompi_info | grep dapl
> MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.5)
>
> If I do not include "self" in the mca, then I get an error saying it
> cannot find the btl component:
>
> # mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl /usr/mpi/
> gcc/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 pingpong
> --------------------------------------------------------------------------
> No available btl components were found!
>
> This means that there are no components of this type installed on your
> system or all the components reported that they could not be used.
>
> This is a fatal error; your MPI process is likely to abort. Check the
> output of the "ompi_info" command and ensure that components of this
> type are available on your system. You may also wish to check the
> value of the "component_path" MCA parameter and ensure that it has at
> least one directory that contains valid MCA components.
>
> --------------------------------------------------------------------------
> mpirun noticed that job rank 1 with PID 4335 on node vic20-10g
> exited on
> signal 15 (Terminated).
>
> # ompi_info --all | grep component_path
> MCA mca: parameter "mca_component_path" (current
> value: "/usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi:/root/.openmpi/
> components")
>
> # ls /usr/mpi/gcc/openmpi-1.2-svn/lib/openmpi | grep dapl
> mca_btl_udapl.la
> mca_btl_udapl.so
>
> So it looks to me like it should be finding it, but perhaps I am
> lacking
> something in my configuration. Any ideas?
>
> Thanks,
> Jon
>
>
>>
>> Tim
>>
>> On Wednesday 31 October 2007 06:11:39 pm Jon Mason wrote:
>>> I am having a bit of a problem getting udapl to work via mpirun
>>> (over
>>> open-mpi, obviously). I am running a basic pingpong test and I
>>> get the
>>> following error.
>>>
>>> # mpirun --n 2 --host vic12-10g,vic20-10g -mca btl udapl,self
>>> /usr/mpi/gcc/open*/tests/IMB*/IMB-MPI1 pingpong
>>> --------------------------------------------------------------------------
>>> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
>>> If you specified the use of a BTL component, you may have
>>> forgotten a component (such as "self") in the list of
>>> usable components.
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> It looks like MPI_INIT failed for some reason; your parallel
>>> process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems. This failure appears to be an internal failure; here's
>>> some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>> PML add procs failed
>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** before MPI was initialized
>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> --------------------------------------------------------------------------
>>> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
>>> If you specified the use of a BTL component, you may have
>>> forgotten a component (such as "self") in the list of
>>> usable components.
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> It looks like MPI_INIT failed for some reason; your parallel
>>> process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems. This failure appears to be an internal failure; here's
>>> some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>> PML add procs failed
>>> --> Returned "Unreachable" (-12) instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** before MPI was initialized
>>> *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>
>>>
>>>
>>> The command is successful if udapl is replaced with tcp or
>>> openib. So I
>>> think my setup is correct. Also, dapltest successfully completes
>>> without any problems over IB or iWARP.
>>>
>>> Any thoughts or suggestions would be greatly appreciated.
>>>
>>> Thanks,
>>> Jon
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/pkcs7-signature attachment: smime.p7s