Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2013-09-03 19:19:25


Interesting. That should work then. I haven't tested it under batch mode though. Let
me try to reproduce on Cielito and see what happens.

-Nathan

On Tue, Sep 03, 2013 at 11:04:40PM +0000, Teranishi, Keita wrote:
> Nathan,
>
> It is close to Cielo and use resource manager under
> /opt/cray/xe-sysroot/4.1.40/usr.
>
> Currently Loaded Modulefiles:
> 1) modules/3.2.6.7 17)
> csa/3.0.0-1_2.0401.37452.4.50.gem
> 2) craype-network-gemini 18)
> job/1.5.5-0.1_2.0401.35380.1.10.gem
> 3) cray-mpich2/5.6.4 19)
> xpmem/0.1-2.0401.36790.4.3.gem
> 4) atp/1.6.3 20)
> gni-headers/2.1-1.0401.5675.4.4.gem
> 5) xe-sysroot/4.1.40 21)
> dmapp/3.2.1-1.0401.5983.4.5.gem
> 6) switch/1.0-1.0401.36779.2.72.gem 22)
> pmi/2.1.4-1.0000.8596.8.9.gem
> 7) shared-root/1.0-1.0401.37253.3.50.gem 23)
> ugni/4.0-1.0401.5928.9.5.gem
> 8) pdsh/2.26-1.0401.37449.1.1.gem 24)
> udreg/2.3.2-1.0401.5929.3.3.gem
> 9) nodehealth/5.0-1.0401.38460.12.18.gem 25) xt-libsci/12.0.00
> 10) lbcd/2.1-1.0401.35360.1.2.gem 26) xt-totalview/8.12.0
> 11) hosts/1.0-1.0401.35364.1.115.gem 27) totalview-support/1.1.4
> 12) configuration/1.0-1.0401.35391.1.2.gem 28) gcc/4.7.2
> 13) ccm/2.2.0-1.0401.37254.2.142 29) xt-asyncpe/5.22
> 14) audit/1.0.0-1.0401.37969.2.32.gem 30) eswrap/1.0.8
> 15) rca/1.0.0-2.0401.38656.2.2.gem 31) craype-mc8
> 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120 32) PrgEnv-gnu/4.1.40
>
>
> Thanks,
> Keita
>
>
>
> On 9/3/13 3:42 PM, "Nathan Hjelm" <hjelmn_at_[hidden]> wrote:
>
> >Hmm, what CLE release is your development cluster running? It is the value
> >after PrgEnv. Ex. on Cielito we have 4.1.40.
> >
> >32) PrgEnv-gnu/4.1.40
> >
> >We have not yet fully tested Open MPI on CLE 5.x.x.
> >
> >-Nathan Hjelm
> >HPC-3, LANL
> >
> >On Tue, Sep 03, 2013 at 10:33:57PM +0000, Teranishi, Keita wrote:
> >> Hi,
> >>
> >> Here is what I put in my PBS script to allocate only single node (I want
> >> to use 16 MPI processes in a single node).
> >>
> >> #PBS -l mppwidth=16
> >> #PBS -l mppnppn=16
> >> #PBS -l mppdepth=1
> >>
> >> Here is the output from aprun (aprun -n 16 -N 16).
> >> Process 2 of 16 is on nid00017
> >> Process 5 of 16 is on nid00017
> >> Process 8 of 16 is on nid00017
> >> Process 12 of 16 is on nid00017
> >> Process 4 of 16 is on nid00017
> >> Process 14 of 16 is on nid00017
> >> Process 0 of 16 is on nid00017
> >> Process 1 of 16 is on nid00017
> >> Process 3 of 16 is on nid00017
> >> Process 13 of 16 is on nid00017
> >> Process 9 of 16 is on nid00017
> >> Process 6 of 16 is on nid00017
> >> Process 11 of 16 is on nid00017
> >> Process 10 of 16 is on nid00017
> >> Process 7 of 16 is on nid00017
> >> Process 15 of 16 is on nid00017
> >>
> >>
> >>
> >> I am guessing that the CrayXE6 here is different from the others in
> >> production (it is 1 cabinet configuration for code development) and I am
> >> afraid mpirun/mpiexec does wrong instantiation of aprun command. Do I
> >>have
> >> to edit the script in contrib?
> >>
> >>
> >> Thanks,
> >> Keita
> >>
> >> On 9/3/13 2:51 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
> >>
> >> >Interesting - and do you have an allocation? If so, what was it - i.e.,
> >> >can you check the allocation envar to see if you have 16 slots?
> >> >
> >> >
> >> >On Sep 3, 2013, at 1:38 PM, "Teranishi, Keita" <knteran_at_[hidden]>
> >>wrote:
> >> >
> >> >> It is what I got.
> >> >>
> >> >>
> >>
> >>>>-----------------------------------------------------------------------
> >>>>--
> >> >>-
> >> >> There are not enough slots available in the system to satisfy the 16
> >> >>slots
> >> >> that were requested by the application:
> >> >> /home/knteran/test-openmpi/cpi
> >> >>
> >> >> Either request fewer slots for your application, or make more slots
> >> >> available
> >> >> for use.
> >> >>
> >>
> >>>>-----------------------------------------------------------------------
> >>>>--
> >> >>-
> >> >>
> >> >> Thanks,
> >> >> Keita
> >> >>
> >> >>
> >> >>
> >> >> On 9/3/13 1:26 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
> >> >>
> >> >>> How does it fail?
> >> >>>
> >> >>> On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita" <knteran_at_[hidden]>
> >> >>>wrote:
> >> >>>
> >> >>>> Nathan,
> >> >>>>
> >> >>>> Thanks for the help. I can run a job using openmpi, assigning a
> >> >>>>signle
> >> >>>> process per node. However, I have been failing to run a job using
> >> >>>> multiple MPI ranks in a single node. In other words, "mpiexec
> >> >>>> --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n
> >>16
> >> >>>> works
> >> >>>> fine). DO you have any thought about it?
> >> >>>>
> >> >>>> Thanks,
> >> >>>> ---------------------------------------------
> >> >>>> Keita Teranishi
> >> >>>> R&D Principal Staff Member
> >> >>>> Scalable Modeling and Analysis Systems
> >> >>>> Sandia National Laboratories
> >> >>>> Livermore, CA 94551
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On 8/30/13 8:49 AM, "Hjelm, Nathan T" <hjelmn_at_[hidden]> wrote:
> >> >>>>
> >> >>>>> Replace install_path to where you want Open MPI installed.
> >> >>>>>
> >> >>>>> ./configure --prefix=install_path
> >> >>>>> --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
> >> >>>>> make
> >> >>>>> make install
> >> >>>>>
> >> >>>>> To use Open MPI just set the PATH and LD_LIBRARY_PATH:
> >> >>>>>
> >> >>>>> PATH=install_path/bin:$PATH
> >> >>>>> LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
> >> >>>>>
> >> >>>>> You can then use mpicc, mpicxx, mpif90, etc to compile and either
> >> >>>>> mpirun
> >> >>>>> or aprun to run. If you are running at scale I would recommend
> >> >>>>>against
> >> >>>>> using aprun for now. I also recommend you change your programming
> >> >>>>> environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler
> >> >>>>>can
> >> >>>>> be
> >> >>>>> a PIA. It is possible to build with the Cray compiler but it takes
> >> >>>>> patching the config.guess and changing some autoconf stuff.
> >> >>>>>
> >> >>>>> -Nathan
> >> >>>>>
> >> >>>>> Please excuse the horrible Outlook-style quoting.
> >> >>>>> ________________________________________
> >> >>>>> From: users [users-bounces_at_[hidden]] on behalf of Teranishi,
> >> >>>>>Keita
> >> >>>>> [knteran_at_[hidden]]
> >> >>>>> Sent: Thursday, August 29, 2013 8:01 PM
> >> >>>>> To: Open MPI Users
> >> >>>>> Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray
> >> >>>>>XE6)
> >> >>>>> is working for OpenMPI-1.6.5?
> >> >>>>>
> >> >>>>> Thanks for the info. Is it still possible to build by myself?
> >>What
> >> >>>>>is
> >> >>>>> the procedure other than configure script?
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> On 8/23/13 2:37 PM, "Nathan Hjelm" <hjelmn_at_[hidden]> wrote:
> >> >>>>>
> >> >>>>>> On Fri, Aug 23, 2013 at 09:14:25PM +0000, Teranishi, Keita wrote:
> >> >>>>>>> Hi,
> >> >>>>>>> I am trying to install OpenMPI 1.6.5 on Cray XE6 and very
> >>curious
> >> >>>>>>> with the
> >> >>>>>>> current support of PMI. In the previous discussions, there
> >>was a
> >> >>>>>>> comment
> >> >>>>>>> on the version of PMI (it works with 2.1.4, but fails with
> >>3.0).
> >> >>>>>>> Our
> >> >>>>>>
> >> >>>>>> Open MPI 1.6.5 does not have support for the XE-6. Use 1.7.2
> >> >>>>>>instead.
> >> >>>>>>
> >> >>>>>>> machine has PMI2.1.4 and PMI4.0 (default). Which version do
> >>you
> >> >>>>>>
> >> >>>>>> There was a regression in PMI 3.x.x that still exists in 4.0.x
> >>that
> >> >>>>>> causes a warning to be printed on every rank when using mpirun.
> >>We
> >> >>>>>>are
> >> >>>>>> working with Cray to resolve the issue. For now use 2.1.4. See
> >>the
> >> >>>>>> platform files in contrib/platform/lanl/cray_xe6. The platform
> >>files
> >> >>>>>> you
> >> >>>>>> would want to use are debug-lustre or optimized-lusre.
> >> >>>>>>
> >> >>>>>> BTW, 1.7.2 is installed on Cielo and Cielito. Just run:
> >> >>>>>>
> >> >>>>>> module swap PrgEnv-pgi PrgEnv-gnu (PrgEnv-intel also works)
> >> >>>>>> module unload cray-mpich2 xt-libsci
> >> >>>>>> module load openmpi/1.7.2
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> -Nathan Hjelm
> >> >>>>>> Open MPI Team, HPC-3, LANL
> >> >>>>>> _______________________________________________
> >> >>>>>> users mailing list
> >> >>>>>> users_at_[hidden]
> >> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >>>>>
> >> >>>>> _______________________________________________
> >> >>>>> users mailing list
> >> >>>>> users_at_[hidden]
> >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >>>>> _______________________________________________
> >> >>>>> users mailing list
> >> >>>>> users_at_[hidden]
> >> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> users mailing list
> >> >>>> users_at_[hidden]
> >> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >>>
> >> >>> _______________________________________________
> >> >>> users mailing list
> >> >>> users_at_[hidden]
> >> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >>
> >> >> _______________________________________________
> >> >> users mailing list
> >> >> users_at_[hidden]
> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> >
> >> >_______________________________________________
> >> >users mailing list
> >> >users_at_[hidden]
> >> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >_______________________________________________
> >users mailing list
> >users_at_[hidden]
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users