Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-02-15 11:01:44


On Feb 14, 2012, at 10:47 AM, yanyg_at_[hidden] wrote:

> Yes, in short, I start a c-shell script from bash command line, in
> which I mpirun another c-shell script which start the computing
> process. The only OMPI related envars are PATH and
> LD_LIBRARY_PATH. Any other OPMI envars I should set?

No, there are no others you need to set. Ralph's referring to the fact that we set OMPI environment variables in the processes that are started on the remote nodes.

I was asking to ensure you hadn't set any MCA parameters in the environment that could be creating a problem. Do you have any set in files, perchance?

And can you run "env | grep OMPI" from the script that you invoked via mpirun?

So just to be clear on the exact problem you're seeing:

- you mpirun on a single node and all works fine
- you mpirun on multiple nodes and all works fine (e.g., mpirun --host a,b,c your_executable)
- you mpirun on multiple nodes and list a host more than once and it hangs (e.g., mpirun --host a,a,b,c your_executable)

Is that correct?

If so, can you attach a debugger to one of the hung processes and see exactly where it's hung? (i.e., get the stack traces)

Per a question from your prior mail: yes, Open MPI does create mmapped files in /tmp for use with shared memory communication. They *should* get cleaned up when you exit, however, unless something disastrous happens.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/