Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Status of SLURM integration
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-01-11 10:17:49

Well, yes - but it isn't quite that simple. :-/

If you want to direct-launch on slurm without using the resv_ports option, you need to build OMPI to include PMI support by including --with-pmi on your configure cmd line. You may need to point to where pmi.h resides (e.g., --with-pmi=/opt/slurm/include).

We don't do that automatically because slurm's pmi.h is GPL, and so the resulting binary is GPL. This isn't an issue if you are just using the binary and not distributing it, but we chose to not surprise anyone.

If you build the PMI support, then you can just srun your app without using resv_ports.


On Jan 11, 2012, at 6:04 AM, Jeff Squyres wrote:

> The latest -- 1.5.5rc2 (just released last night) -- has direct "srun my_mpi_application" integration. It's not in a final release yet, but as you can probably guess by the version number, it'll be in the final version of 1.5.5.
> We have 1-2 bugs remaining in 1.5.5 that are actively being worked. Once those are fixed (hopefully, in the Very Near Future), 1.5.5 will be released.
> On Jan 10, 2012, at 11:38 PM, Andrew Senin wrote:
>> Hi,
>> Could you please describe the current status of SLURM integration? I
>> had a feeling srun supports direct launch of OpenMpi applications
>> (without mpirun) compiled with the 1.5 branch. At least one of my
>> colleagu succeeded on that.
>> But when I installed SLURM and the head revision of OpenMPI 1.5 branch
>> I did not manage to run it without settings the SLURM_STEP_RESV_PORTS
>> environment variable. I receive the following:
>> orte_grpcomm_modex failed
>> --> Returned "A message is attempting to be sent to a process whose
>> contact information is unknown" (-117) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> [mir9:25477] *** An error occurred in MPI_Init
>> [mir9:25477] *** on a NULL communicator
>> [mir9:25477] *** Unknown error
>> [mir9:25477] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> So I have 2 questions:
>> 1. Is support of SLURM in the head revision of 1.5 branch stable
>> enough to use it in the lab?
>> 2. Does direct launch of mpi applications require setting the
>> SLURM_STEP_RESV_PORTS environment variable?
>> Thanks,
>> Andrew Senin.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> _______________________________________________
> users mailing list
> users_at_[hidden]