Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-07-19 11:24:18


You are correct - I misread the note. My bad.

I'll look at how we might ensure the LD_LIBRARY_PATH shows up correctly -
shouldn't be a big deal.

On 7/19/07 9:12 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:

> The second execution (the one that you make reference to) is the one
> that works fine. The failing one is the first one, where
> LD_LIBRARY_PATH is not provided. As Gleb indicate using localhost
> make the problem vanish.
>
> george.
>
> On Jul 19, 2007, at 10:57 AM, Ralph H Castain wrote:
>
>> But it *does* provide an LD_LIBRARY_PATH that is pointing to your
>> openmpi
>> installation - it says it did it right here in your debug output:
>>
>>>>> [elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/
>>>>> openmpi/lib
>>
>> I suspect that the problem isn't in the launcher, but rather in the
>> iof
>> again. Why don't we wait until those fixes come into the trunk before
>> chasing our tails any further?
>>
>>
>> On 7/19/07 8:18 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
>>
>>> On Thu, Jul 19, 2007 at 08:07:51AM -0600, Ralph H Castain wrote:
>>>> Interesting. Apparently, it is getting a NULL back when it tries
>>>> to access
>>>> the LD_LIBRARY_PATH in your environment. Here is the code involved:
>>>>
>>>> newenv = opal_os_path( false, prefix_dir, lib_base, NULL );
>>>> oldenv = getenv("LD_LIBRARY_PATH");
>>>> if (NULL != oldenv) {
>>>> char* temp;
>>>> asprintf(&temp, "%s:%s", newenv, oldenv);
>>>> free(newenv);
>>>> newenv = temp;
>>>> }
>>>> opal_setenv("LD_LIBRARY_PATH", newenv, true, &env);
>>>> if (mca_pls_rsh_component.debug) {
>>>> opal_output(0, "pls:rsh: reset LD_LIBRARY_PATH: %s",
>>>> newenv);
>>>> }
>>>> free(newenv);
>>>>
>>>> So you can see that the only way we can get your debugging output
>>>> is for the
>>>> LD_LIBRARY_PATH in your starting environment to be NULL. Note
>>>> that this
>>>> comes after we fork, so we are talking about the child process -
>>>> not sure
>>>> that matters, but may as well point it out.
>>>>
>>>> So the question is: why do you not have LD_LIBRARY_PATH set in your
>>>> environment when you provide a different hostname?
>>> Right I don't have LD_LIBRARY_PATH set in my environment, but I
>>> expect
>>> that mpirun will provide working environment for all ranks not just
>>> remote ones. This is how it worked before. Perhaps that was a bug,
>>> but
>>> this was useful bug :)
>>>
>>>>
>>>>
>>>> On 7/19/07 7:45 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
>>>>
>>>>> On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote:
>>>>>> On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote:
>>>>>>> But this will lockup:
>>>>>>>
>>>>>>> pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961
>>>>>>> printenv | grep
>>>>>>> LD
>>>>>>>
>>>>>>> The reason is that the hostname in this last command doesn't
>>>>>>> match the
>>>>>>> hostname I get when I query my interfaces, so mpirun thinks it
>>>>>>> must be a
>>>>>>> remote host - and so we stick in ssh until that times out.
>>>>>>> Which could be
>>>>>>> quick on your machine, but takes awhile for me.
>>>>>>>
>>>>>> This is not my case. mpirun resolves hostname and runs env but
>>>>>> LD_LIBRARY_PATH is not there. If I use full name like this
>>>>>> # /home/glebn/openmpi/bin/mpirun -np 1 -H elfit1.voltaire.com
>>>>>> env | grep
>>>>>> LD_LIBRARY_PATH
>>>>>> LD_LIBRARY_PATH=/home/glebn/openmpi/lib
>>>>>>
>>>>>> everything is OK.
>>>>>>
>>>>> More info. If I provide hostname to mpirun as returned by command
>>>>> "hostname" the LD_LIBRARY_PATH is not set:
>>>>> # /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` env | grep LD
>>>>> OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests
>>>>>
>>>>> if I provide any other name that resolves to the same IP then
>>>>> LD_LIBRARY_PATH is set.
>>>>> # /home/glebn/openmpi/bin/mpirun -np 1 -H localhost env | grep LD
>>>>> OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests
>>>>> LD_LIBRARY_PATH=/home/glebn/openmpi/lib
>>>>>
>>>>> Here is debug output of "bad" run:
>>>>> /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` -mca
>>>>> pls_rsh_debug 1 echo
>>>>> [elfit1:14730] pls:rsh: launching job 1
>>>>> [elfit1:14730] pls:rsh: no new daemons to launch
>>>>>
>>>>> Here is good one:
>>>>> /home/glebn/openmpi/bin/mpirun -np 1 -H localhost -mca
>>>>> pls_rsh_debug 1 echo
>>>>> [elfit1:14752] pls:rsh: launching job 1
>>>>> [elfit1:14752] pls:rsh: local csh: 0, local sh: 1
>>>>> [elfit1:14752] pls:rsh: assuming same remote shell as local shell
>>>>> [elfit1:14752] pls:rsh: remote csh: 0, remote sh: 1
>>>>> [elfit1:14752] pls:rsh: final template argv:
>>>>> [elfit1:14752] pls:rsh: /usr/bin/ssh <template> orted --name
>>>>> <template>
>>>>> --num_procs 1 --vpid_start 0 --nodename <template> --universe
>>>>> root_at_elfit1:default-universe-14752 --nsreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --
>>>>> gprreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca
>>>>> mca_base_param_file_path
>>>>> /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/
>>>>> glebn/openmpi
>>>>> wd
>>>>> -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
>>>>> [elfit1:14752] pls:rsh: launching on node localhost
>>>>> [elfit1:14752] pls:rsh: localhost is a LOCAL node
>>>>> [elfit1:14752] pls:rsh: reset PATH:
>>>>> /home/glebn/openmpi/bin:/home/USERS/lenny/MPI/mpi/bin:/opt/
>>>>> vltmpi/OPENIB/mpi
>>>>> /b
>>>>> in:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/
>>>>> local/bin:/sbin
>>>>> :/
>>>>> bin:/usr/sbin:/usr/bin:/root/bin
>>>>> [elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/
>>>>> openmpi/lib
>>>>> [elfit1:14752] pls:rsh: changing to directory /root
>>>>> [elfit1:14752] pls:rsh: executing: (/home/glebn/openmpi/bin/
>>>>> orted) [orted
>>>>> --name 0.0.1 --num_procs 1 --vpid_start 0 --nodename localhost --
>>>>> universe
>>>>> root_at_elfit1:default-universe-14752 --nsreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --
>>>>> gprreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca
>>>>> mca_base_param_file_path
>>>>> /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/
>>>>> glebn/openmpi
>>>>> wd
>>>>> -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
>>>>> --set-sid]
>>>>>
>>>>> --
>>>>> Gleb.
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> --
>>> Gleb.
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel