Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: (OpenMPI for Cray XE6 ) How to set mca parameters through aprun?
From: David Whitaker (whitaker_at_[hidden])
Date: 2013-11-25 14:42:52


Hi,
    I'd like to point out that Cray doesn't run a Work Load Manager (WLM)
   on the compute nodes. So if you use PBS or Torque/Moab, your job
   ends up on the login node. You have to use something like "aprun"
   or "ccmrun" to launch the job on the compute nodes.
   Unless "mpirun" or "mpiexec" is Cray aware, it is trying to
   launch processes on the login or MOM node.

    I've only run OpenMPI linked codes in CCM (Cray Cluster
   Compatability) mode. On a system with PBS/Torque/Moab, I use:

   qsub -I -lgres=ccm -lmppwidth=32 -lmppnppn=16

   This gives me an interactivePBS/Torque/Moab session:
    I then do:
      1) cd $PBS_O_WORKDIR
      2) module load ccm # gets access to ccmrun command
      3) setenv PATH /lus/scratch/whitaker/OpenMPI/bin:$PATH
      4) setenv LD_LIBRARY_PATH /lus/scratch/whitaker/OpenMPI/lib
      5) \rm -rf hosts
      6) cat $PBS_NODEFILE > hosts
      7) ccmrun /lus/scratch/whitaker/OpenMPI/bin/mpirun --mca plm ^tm --mca ras
^tm --mca btl openib,sm,self -np 32 -machinefile hosts ./hello

         If you are running under Torque/Moab, OpenMPI will attempt
       to use the Torque/Moab API to launch the job. This won't work
       since Cray does not run a Torque/Moab MOM process
       on the compute nodes. Hence, you have to turn off OpenMPI's
       attempt to use Torque/Moab.

         OpenMPI has a version that speaks uGNI natively on the
       Cray. I have no experience with that.

ccmrun /lus/scratch/whitaker/OpenMPI/bin/mpirun --mca plm ^tm --mca ras ^tm -np
32 -machinefile hosts ./hello
  Hello World!, I am 0 of 32 (NodeID=nid00056)
  Hello World!, I am 1 of 32 (NodeID=nid00056)
  Hello World!, I am 2 of 32 (NodeID=nid00056)
  Hello World!, I am 3 of 32 (NodeID=nid00056)
  Hello World!, I am 4 of 32 (NodeID=nid00056)
  Hello World!, I am 5 of 32 (NodeID=nid00056)
  Hello World!, I am 6 of 32 (NodeID=nid00056)
  Hello World!, I am 7 of 32 (NodeID=nid00056)
  Hello World!, I am 8 of 32 (NodeID=nid00056)
  Hello World!, I am 9 of 32 (NodeID=nid00056)
  Hello World!, I am 10 of 32 (NodeID=nid00056)
  Hello World!, I am 11 of 32 (NodeID=nid00056)
  Hello World!, I am 12 of 32 (NodeID=nid00056)
  Hello World!, I am 13 of 32 (NodeID=nid00056)
  Hello World!, I am 14 of 32 (NodeID=nid00056)
  Hello World!, I am 15 of 32 (NodeID=nid00056)
  Hello World!, I am 16 of 32 (NodeID=nid00057)
  Hello World!, I am 17 of 32 (NodeID=nid00057)
  Hello World!, I am 18 of 32 (NodeID=nid00057)
  Hello World!, I am 21 of 32 (NodeID=nid00057)
  Hello World!, I am 19 of 32 (NodeID=nid00057)
  Hello World!, I am 20 of 32 (NodeID=nid00057)
  Hello World!, I am 22 of 32 (NodeID=nid00057)
  Hello World!, I am 23 of 32 (NodeID=nid00057)
  Hello World!, I am 24 of 32 (NodeID=nid00057)
  Hello World!, I am 25 of 32 (NodeID=nid00057)
  Hello World!, I am 26 of 32 (NodeID=nid00057)
  Hello World!, I am 27 of 32 (NodeID=nid00057)
  Hello World!, I am 28 of 32 (NodeID=nid00057)
  Hello World!, I am 29 of 32 (NodeID=nid00057)
  Hello World!, I am 30 of 32 (NodeID=nid00057)
  Hello World!, I am 31 of 32 (NodeID=nid00057)

           Hope this helps,
                 Dave

On 11/23/2013 05:27 PM, Teranishi, Keita wrote:
> Here is the module environment, and I allocate interactive node by "qsub -I
> -lmppwidth=32 -lmppnppn=16."
> module list
> Currently Loaded Modulefiles:
> 1) modules/3.2.6.7
> 2) craype-network-gemini
> 3) cray-mpich2/5.6.4
> 4) atp/1.6.3
> 5) xe-sysroot/4.1.40
> 6) switch/1.0-1.0401.36779.2.72.gem
> 7) shared-root/1.0-1.0401.37253.3.50.gem
> 8) pdsh/2.26-1.0401.37449.1.1.gem
> 9) nodehealth/5.0-1.0401.38460.12.18.gem
> 10) lbcd/2.1-1.0401.35360.1.2.gem
> 11) hosts/1.0-1.0401.35364.1.115.gem
> 12) configuration/1.0-1.0401.35391.1.2.gem
> 13) ccm/2.2.0-1.0401.37254.2.142
> 14) audit/1.0.0-1.0401.37969.2.32.gem
> 15) rca/1.0.0-2.0401.38656.2.2.gem
> 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120
> 17) csa/3.0.0-1_2.0401.37452.4.50.gem
> 18) job/1.5.5-0.1_2.0401.35380.1.10.gem
> 19) xpmem/0.1-2.0401.36790.4.3.gem
> 20) gni-headers/2.1-1.0401.5675.4.4.gem
> 21) dmapp/3.2.1-1.0401.5983.4.5.gem
> 22) pmi/4.0.1-1.0000.9421.73.3.gem
> 23) ugni/4.0-1.0401.5928.9.5.gem
> 24) udreg/2.3.2-1.0401.5929.3.3.gem
> 25) xt-libsci/12.0.00
> 26) xt-totalview/8.12.0
> 27) totalview-support/1.1.5
> 28) gcc/4.7.2
> 29) xt-asyncpe/5.22
> 30) eswrap/1.0.8
> 31) craype-mc8
> 32) PrgEnv-gnu/4.1.40
> 33) moab/5.4.4
>
>
> In interactive mode (as well as batch mode), "aprun --np 32" can run my
> OpenMPI code.
> aprun -n 32 ./cpi
> Process 5 of 32 is on nid00015
> Process 7 of 32 is on nid00015
> Process 2 of 32 is on nid00015
> Process 0 of 32 is on nid00015
> Process 13 of 32 is on nid00015
> Process 10 of 32 is on nid00015
> Process 3 of 32 is on nid00015
> Process 1 of 32 is on nid00015
> Process 6 of 32 is on nid00015
> Process 4 of 32 is on nid00015
> Process 15 of 32 is on nid00015
> Process 9 of 32 is on nid00015
> Process 12 of 32 is on nid00015
> Process 8 of 32 is on nid00015
> Process 11 of 32 is on nid00015
> Process 14 of 32 is on nid00015
> Process 29 of 32 is on nid00014
> Process 22 of 32 is on nid00014
> Process 17 of 32 is on nid00014
> Process 28 of 32 is on nid00014
> Process 31 of 32 is on nid00014
> Process 26 of 32 is on nid00014
> Process 30 of 32 is on nid00014
> Process 16 of 32 is on nid00014
> Process 25 of 32 is on nid00014
> Process 24 of 32 is on nid00014
> Process 21 of 32 is on nid00014
> Process 20 of 32 is on nid00014
> Process 27 of 32 is on nid00014
> Process 19 of 32 is on nid00014
> Process 18 of 32 is on nid00014
> Process 23 of 32 is on nid00014
> pi is approximately 3.1415926544231265, Error is 0.0000000008333334
> wall clock time = 0.004645
>
>
> Here is what I have with openmpi.
> mpiexec --bind-to-core --mca plm_base_strip_prefix_from_node_names 0 -np 32 ./cpi
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 32 slots
> that were requested by the application:
> ./cpi
>
> Either request fewer slots for your application, or make more slots available
> for use.
> --------------------------------------------------------------------------
>
>
>
> From: Ralph Castain <rhc_at_[hidden] <mailto:rhc_at_[hidden]>>
> Reply-To: Open MPI Users <users_at_[hidden] <mailto:users_at_[hidden]>>
> Date: Saturday, November 23, 2013 2:27 PM
> To: Open MPI Users <users_at_[hidden] <mailto:users_at_[hidden]>>
> Subject: [EXTERNAL] Re: [OMPI users] (OpenMPI for Cray XE6 ) How to set mca
> parameters through aprun?
>
> My guess is that you aren't doing the allocation correctly - since you are
> using qsub, can I assume you have Moab as your scheduler?
>
> aprun should be forwarding the envars - do you see them if you just run "aprun
> -n 1 printenv"?
>
> On Nov 23, 2013, at 2:13 PM, Teranishi, Keita <knteran_at_[hidden]
> <mailto:knteran_at_[hidden]>> wrote:
>
>> Hi,
>>
>> I installed OpenMPI on our small XE6 using the configure options under
>> /contrib directory. It appears it is working fine, but it ignores MCA
>> parameters (set in env var). So I switched to mpirun (in OpenMPI) and it can
>> handle MCA parameters somehow. However, mpirun fails to allocate process by
>> cores. For example, I allocated 32 cores (on 2 nodes) by "qsub
>> --lmppwidth=32 --lmppnppn=16", mpirun recognizes it as 2 slots. Is it
>> possible to mpirun to handle mluticore nodes of XE6 properly or is there any
>> options to handle MCA parameters for aprun?
>>
>> Regards,
>> -----------------------------------------------------------------------------
>> Keita Teranishi
>> Principal Member of Technical Staff
>> Scalable Modeling and Analysis Systems
>> Sandia National Laboratories
>> Livermore, CA 94551
>> +1 (925) 294-3738
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
CCCCCCCCCCCCCCCCCCCCCCFFFFFFFFFFFFFFFFFFFFFFFFFDDDDDDDDDDDDDDDDDDDDD
David Whitaker, Ph.D.                              whitaker_at_[hidden]
Aerospace CFD Specialist                        phone: (651)605-9078
ISV Applications/Cray Inc                         fax: (651)605-9001
CCCCCCCCCCCCCCCCCCCCCCFFFFFFFFFFFFFFFFFFFFFFFFFDDDDDDDDDDDDDDDDDDDDD