Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-02-25 10:04:05


Ensure to check that a) your .bashrc is actually executing when you "ssh othernode env", and b) if .bashrc is executing, make sure that it isn't prematurely exiting for non-interactive jobs.

On Feb 25, 2011, at 9:58 AM, Xianglong Kong wrote:

> I'm using openmpi 1.4.3. The cluster consist of two desktop with Intel
> core 2 duo running on Ubuntu 10.04.
>
> A weird thing that i found is that when I issued the command "env |
> grep LD_LIBRARY_PATH" on the slave node, it showed the mpi lib path.
> But when
> I issude the command "ssh slave-node env | grep LD_LIBRARY_PATH" on
> the master side to check the LD_LIBRARY_PATH of the slave node, it
> showed nothing. Also, issuing the command "ssh master-node env | grep
> LD_LIBRARY_PATH" on the slave side would return the correct mpi lib
> path.
>
> I tried to modify the .bashrc and the /etc/ld.so.conf.d/*.conf file to
> configure the LD_LIBRARY_PATH on the slave node, but it seems to work
> only locally. How can I set the LD_LIBRARY_PATH on the slave node
> side, so that I can get the correct path when I use "ssh slave-node
> env | grep LD_LIBRARY_PATH" on the master side?
>
> Kong
>
> On Wed, Feb 23, 2011 at 5:22 PM, Bill Rankin <Bill.Rankin_at_[hidden]> wrote:
>> Jeff:
>>
>>> FWIW: I have rarely seen this to be the issue.
>>
>> Been bitten by similar situations before. But it may not have been OpenMPI. In any case it was a while back.
>>
>>> In short, programs are erroneous that do not guarantee that all their
>>> outstanding requests have completed before calling finalize.
>>
>> Agreed 100%. The barrier won't prevent the case of unmatched sends/receives or outstanding request handles, but if the logic is correct it does make sure that everyone completes before anyone leaves.
>>
>> In any case, I also tried code #2 and it completed w/o issue on our cluster. I guess the next thing to ask Kong is regarding what version he is running and what is the platform.
>>
>> -b
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Xianglong Kong
> Department of Mechanical Engineering
> University of Rochester
> Phone: (585)520-4412
> MSN: dinosaur8312_at_[hidden]
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/