On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>> Was this ever committed to the OMPI src as something not having to be
>> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
> Not that I know of - I don't think the PSM developers ever looked at it.
>> I'm having some trouble getting Slurm/OpenMPI to play nice with the
>> setup of this key. Namely, with slurm you cannot export variables
>> from the --prolog of an srun, only from an --task-prolog,
>> unfortunately, if you use a task-prolog each rank gets a different
>> key, which doesn't work.
>> I'm also guessing that each unique mpirun needs it's own psm key, not
>> one for the whole system, so i can't just make it a permanent
>> parameter somewhere else.
>> Also, i recall reading somewhere that the --resv-ports parameter that
>> OMPI uses from slurm to choose a list of ports to use for TCP comm's,
>> tries to lock a port from the pool three times before giving up.
> Had to look back at the code - I think you misread this. I can find no evidence in the code that we try to bind that port more than once.
Perhaps i misstated, i don't believe you're trying to bind to the same
port twice during the same session. i believe the code re-uses
similar ports from session to session. what i believe happens (but
could be totally wrong) the previous session releases the port, but
linux isn't quite done with it when the new session tries to bind to
the port, in which case it tries three times and then fails the job