Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?
From: Teranishi, Keita (knteran_at_[hidden])
Date: 2013-09-03 19:04:40


Nathan,

It is close to Cielo and use resource manager under
/opt/cray/xe-sysroot/4.1.40/usr.

Currently Loaded Modulefiles:
  1) modules/3.2.6.7 17)
csa/3.0.0-1_2.0401.37452.4.50.gem
  2) craype-network-gemini 18)
job/1.5.5-0.1_2.0401.35380.1.10.gem
  3) cray-mpich2/5.6.4 19)
xpmem/0.1-2.0401.36790.4.3.gem
  4) atp/1.6.3 20)
gni-headers/2.1-1.0401.5675.4.4.gem
  5) xe-sysroot/4.1.40 21)
dmapp/3.2.1-1.0401.5983.4.5.gem
  6) switch/1.0-1.0401.36779.2.72.gem 22)
pmi/2.1.4-1.0000.8596.8.9.gem
  7) shared-root/1.0-1.0401.37253.3.50.gem 23)
ugni/4.0-1.0401.5928.9.5.gem
  8) pdsh/2.26-1.0401.37449.1.1.gem 24)
udreg/2.3.2-1.0401.5929.3.3.gem
  9) nodehealth/5.0-1.0401.38460.12.18.gem 25) xt-libsci/12.0.00
 10) lbcd/2.1-1.0401.35360.1.2.gem 26) xt-totalview/8.12.0
 11) hosts/1.0-1.0401.35364.1.115.gem 27) totalview-support/1.1.4
 12) configuration/1.0-1.0401.35391.1.2.gem 28) gcc/4.7.2
 13) ccm/2.2.0-1.0401.37254.2.142 29) xt-asyncpe/5.22
 14) audit/1.0.0-1.0401.37969.2.32.gem 30) eswrap/1.0.8
 15) rca/1.0.0-2.0401.38656.2.2.gem 31) craype-mc8
 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120 32) PrgEnv-gnu/4.1.40

Thanks,
Keita

On 9/3/13 3:42 PM, "Nathan Hjelm" <hjelmn_at_[hidden]> wrote:

>Hmm, what CLE release is your development cluster running? It is the value
>after PrgEnv. Ex. on Cielito we have 4.1.40.
>
>32) PrgEnv-gnu/4.1.40
>
>We have not yet fully tested Open MPI on CLE 5.x.x.
>
>-Nathan Hjelm
>HPC-3, LANL
>
>On Tue, Sep 03, 2013 at 10:33:57PM +0000, Teranishi, Keita wrote:
>> Hi,
>>
>> Here is what I put in my PBS script to allocate only single node (I want
>> to use 16 MPI processes in a single node).
>>
>> #PBS -l mppwidth=16
>> #PBS -l mppnppn=16
>> #PBS -l mppdepth=1
>>
>> Here is the output from aprun (aprun -n 16 -N 16).
>> Process 2 of 16 is on nid00017
>> Process 5 of 16 is on nid00017
>> Process 8 of 16 is on nid00017
>> Process 12 of 16 is on nid00017
>> Process 4 of 16 is on nid00017
>> Process 14 of 16 is on nid00017
>> Process 0 of 16 is on nid00017
>> Process 1 of 16 is on nid00017
>> Process 3 of 16 is on nid00017
>> Process 13 of 16 is on nid00017
>> Process 9 of 16 is on nid00017
>> Process 6 of 16 is on nid00017
>> Process 11 of 16 is on nid00017
>> Process 10 of 16 is on nid00017
>> Process 7 of 16 is on nid00017
>> Process 15 of 16 is on nid00017
>>
>>
>>
>> I am guessing that the CrayXE6 here is different from the others in
>> production (it is 1 cabinet configuration for code development) and I am
>> afraid mpirun/mpiexec does wrong instantiation of aprun command. Do I
>>have
>> to edit the script in contrib?
>>
>>
>> Thanks,
>> Keita
>>
>> On 9/3/13 2:51 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>
>> >Interesting - and do you have an allocation? If so, what was it - i.e.,
>> >can you check the allocation envar to see if you have 16 slots?
>> >
>> >
>> >On Sep 3, 2013, at 1:38 PM, "Teranishi, Keita" <knteran_at_[hidden]>
>>wrote:
>> >
>> >> It is what I got.
>> >>
>> >>
>>
>>>>-----------------------------------------------------------------------
>>>>--
>> >>-
>> >> There are not enough slots available in the system to satisfy the 16
>> >>slots
>> >> that were requested by the application:
>> >> /home/knteran/test-openmpi/cpi
>> >>
>> >> Either request fewer slots for your application, or make more slots
>> >> available
>> >> for use.
>> >>
>>
>>>>-----------------------------------------------------------------------
>>>>--
>> >>-
>> >>
>> >> Thanks,
>> >> Keita
>> >>
>> >>
>> >>
>> >> On 9/3/13 1:26 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>> >>
>> >>> How does it fail?
>> >>>
>> >>> On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita" <knteran_at_[hidden]>
>> >>>wrote:
>> >>>
>> >>>> Nathan,
>> >>>>
>> >>>> Thanks for the help. I can run a job using openmpi, assigning a
>> >>>>signle
>> >>>> process per node. However, I have been failing to run a job using
>> >>>> multiple MPI ranks in a single node. In other words, "mpiexec
>> >>>> --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n
>>16
>> >>>> works
>> >>>> fine). DO you have any thought about it?
>> >>>>
>> >>>> Thanks,
>> >>>> ---------------------------------------------
>> >>>> Keita Teranishi
>> >>>> R&D Principal Staff Member
>> >>>> Scalable Modeling and Analysis Systems
>> >>>> Sandia National Laboratories
>> >>>> Livermore, CA 94551
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 8/30/13 8:49 AM, "Hjelm, Nathan T" <hjelmn_at_[hidden]> wrote:
>> >>>>
>> >>>>> Replace install_path to where you want Open MPI installed.
>> >>>>>
>> >>>>> ./configure --prefix=install_path
>> >>>>> --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
>> >>>>> make
>> >>>>> make install
>> >>>>>
>> >>>>> To use Open MPI just set the PATH and LD_LIBRARY_PATH:
>> >>>>>
>> >>>>> PATH=install_path/bin:$PATH
>> >>>>> LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
>> >>>>>
>> >>>>> You can then use mpicc, mpicxx, mpif90, etc to compile and either
>> >>>>> mpirun
>> >>>>> or aprun to run. If you are running at scale I would recommend
>> >>>>>against
>> >>>>> using aprun for now. I also recommend you change your programming
>> >>>>> environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler
>> >>>>>can
>> >>>>> be
>> >>>>> a PIA. It is possible to build with the Cray compiler but it takes
>> >>>>> patching the config.guess and changing some autoconf stuff.
>> >>>>>
>> >>>>> -Nathan
>> >>>>>
>> >>>>> Please excuse the horrible Outlook-style quoting.
>> >>>>> ________________________________________
>> >>>>> From: users [users-bounces_at_[hidden]] on behalf of Teranishi,
>> >>>>>Keita
>> >>>>> [knteran_at_[hidden]]
>> >>>>> Sent: Thursday, August 29, 2013 8:01 PM
>> >>>>> To: Open MPI Users
>> >>>>> Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray
>> >>>>>XE6)
>> >>>>> is working for OpenMPI-1.6.5?
>> >>>>>
>> >>>>> Thanks for the info. Is it still possible to build by myself?
>>What
>> >>>>>is
>> >>>>> the procedure other than configure script?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 8/23/13 2:37 PM, "Nathan Hjelm" <hjelmn_at_[hidden]> wrote:
>> >>>>>
>> >>>>>> On Fri, Aug 23, 2013 at 09:14:25PM +0000, Teranishi, Keita wrote:
>> >>>>>>> Hi,
>> >>>>>>> I am trying to install OpenMPI 1.6.5 on Cray XE6 and very
>>curious
>> >>>>>>> with the
>> >>>>>>> current support of PMI. In the previous discussions, there
>>was a
>> >>>>>>> comment
>> >>>>>>> on the version of PMI (it works with 2.1.4, but fails with
>>3.0).
>> >>>>>>> Our
>> >>>>>>
>> >>>>>> Open MPI 1.6.5 does not have support for the XE-6. Use 1.7.2
>> >>>>>>instead.
>> >>>>>>
>> >>>>>>> machine has PMI2.1.4 and PMI4.0 (default). Which version do
>>you
>> >>>>>>
>> >>>>>> There was a regression in PMI 3.x.x that still exists in 4.0.x
>>that
>> >>>>>> causes a warning to be printed on every rank when using mpirun.
>>We
>> >>>>>>are
>> >>>>>> working with Cray to resolve the issue. For now use 2.1.4. See
>>the
>> >>>>>> platform files in contrib/platform/lanl/cray_xe6. The platform
>>files
>> >>>>>> you
>> >>>>>> would want to use are debug-lustre or optimized-lusre.
>> >>>>>>
>> >>>>>> BTW, 1.7.2 is installed on Cielo and Cielito. Just run:
>> >>>>>>
>> >>>>>> module swap PrgEnv-pgi PrgEnv-gnu (PrgEnv-intel also works)
>> >>>>>> module unload cray-mpich2 xt-libsci
>> >>>>>> module load openmpi/1.7.2
>> >>>>>>
>> >>>>>>
>> >>>>>> -Nathan Hjelm
>> >>>>>> Open MPI Team, HPC-3, LANL
>> >>>>>> _______________________________________________
>> >>>>>> users mailing list
>> >>>>>> users_at_[hidden]
>> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users_at_[hidden]
>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users_at_[hidden]
>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>>
>> >>>> _______________________________________________
>> >>>> users mailing list
>> >>>> users_at_[hidden]
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>>
>> >>> _______________________________________________
>> >>> users mailing list
>> >>> users_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >_______________________________________________
>> >users mailing list
>> >users_at_[hidden]
>> >http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users