Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi/pbsdsh/Torque problem
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-04-02 23:07:20


I'm afraid I have no idea what you are talking about. Are you saying you are launching OMPI processes via mpirun, but with "pbsdsh" as the plm_rsh_agent???

That would be a very bad idea. If you are running under Torque, then let mpirun "do the right thing" and use its Torque-based launcher.

On the other hand, if you are trying to launch MPI processes directly using pbsdsh, then that simply won't work. The procs will have no idea how to wire up or communicate.

On Apr 2, 2011, at 8:36 PM, Laurence Marks wrote:

> I have a problem which may or may not be openmpi, but since this list
> was useful before with a race condition I am posting.
>
> I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
> Torque does not know about ssh tasks launched from a task. In a simple
> case, a script launches three mpi tasks in parallel,
>
> Task1: NodeA
> Task2: NodeB and NodeC
> Task3: NodeD
>
> (some cores on each, all handled correctly). Reproducible (but with
> different nodes and numbers of cores) Task1 and Task3 work fine, the
> mpi task starts on NodeB but nothing starts on NodeC, it appears that
> NodeC does not communicate. It does not have to be this it could be
>
> Task1: NodeA NodeB
> Task2: NodeC NodeD
>
> Here NodeC will start and it looks as if NodeD never starts anything.
> I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
> Node (number of cores do not matter) it is fine.
>
> --
> Laurence Marks
> Department of Materials Science and Engineering
> MSE Rm 2036 Cook Hall
> 2220 N Campus Drive
> Northwestern University
> Evanston, IL 60208, USA
> Tel: (847) 491-3996 Fax: (847) 491-7820
> email: L-marks at northwestern dot edu
> Web: www.numis.northwestern.edu
> Chair, Commission on Electron Crystallography of IUCR
> www.numis.northwestern.edu/
> Research is to see what everybody else has seen, and to think what
> nobody else has thought
> Albert Szent-Györgi
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users