On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
> Was this ever committed to the OMPI src as something not having to be
> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
Not that I know of - I don't think the PSM developers ever looked at it.
> I'm having some trouble getting Slurm/OpenMPI to play nice with the
> setup of this key. Namely, with slurm you cannot export variables
> from the --prolog of an srun, only from an --task-prolog,
> unfortunately, if you use a task-prolog each rank gets a different
> key, which doesn't work.
> I'm also guessing that each unique mpirun needs it's own psm key, not
> one for the whole system, so i can't just make it a permanent
> parameter somewhere else.
> Also, i recall reading somewhere that the --resv-ports parameter that
> OMPI uses from slurm to choose a list of ports to use for TCP comm's,
> tries to lock a port from the pool three times before giving up.
Had to look back at the code - I think you misread this. I can find no evidence in the code that we try to bind that port more than once.
> Can someone tell me where that parameter is set, i'd like to set it to
> a higher value. We're seeing issues where running a large number of
> short srun's sequentially is causing some of the mpirun's in the
> stream to be killed because they could not lock the ports.
> I suspect because of the lag between when the port is actually closed
> in linux and when ompi re-opens a new port is very quick, we're trying
> three times and giving up. I have more then enough ports in the
> resv-ports list, 30k. but i suspect there is some random re-use being
> done and it's failing
> On Mon, Jan 3, 2011 at 10:00 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> Yo Ralph --
>> I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197. Do you want to add a blurb in README about it, and/or have this executable compiled as part of the PSM MTL and then installed into $bindir (maybe named ompi-psm-keygen)?
>> Right now, it's only compiled as part of "make check" and not installed, right?
>> On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote:
>>> Run the program only once - it can be in the prolog of the job if you like. The output value needs to be in the env of every rank.
>>> You can reuse the value as many times as you like - it doesn't have to be unique for each job. There is nothing magic about the value itself.
>>> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote:
>>>> How early does this need to run? Can I run it as part of a task
>>>> prolog, or does it need to be the shell env for each rank? And does
>>>> it need to run on one node or all the nodes in the job?
>>>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>> Well, I couldn't do it as a patch - proved too complicated as the psm system looks for the value early in the boot procedure.
>>>>> What I can do is give you the attached key generator program. It outputs the envar required to run your program. So if you run the attached program and then export the output into your environment, you should be okay. Looks like this:
>>>>> $ ./psm_keygen
>>>>> You compile the program with the usual mpicc.
>>>>> Let me know if this solves the problem (or not).
> users mailing list