Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI at scale on Cray XK7
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2013-04-23 13:45:23


On Tue, Apr 23, 2013 at 10:17:46AM -0700, Ralph Castain wrote:
>
> On Apr 23, 2013, at 10:09 AM, Nathan Hjelm <hjelmn_at_[hidden]> wrote:
>
> > On Tue, Apr 23, 2013 at 12:21:49PM +0400, ???????????????????? ???????????? wrote:
> >> Hi,
> >>
> >> Nathan, could you please advise what is expected startup time for OpenMPI
> >> job at such scale (128K ranks)? I'm interesting in
> >> 1) time from mpirun start to completion of MPI_Init()
> >
> > It takes less than a minute to run:
> >
> > mpirun -n 131072 /bin/true
> >
> >
> >> 2) time from MPI_Init() start to completion of MPI_Init()
> >
> > A simple MPI application took about about 1.25 mins to run. If you want to see our setup you can take a look at contrib/platform/lanl/cray_xe6.
> >
> >>> From my experience for 52800 rank job
> >> 1) took around 20 min
> >> 2) took around 12 min
> >> that actually looks like a hung.
> >
> > How many nodes? I have never seen launch times that bad on Cielo. You could try adding -mca routed debruijn -novm and see if that helps. It will reduce the amount of communication between compute nodes and the login node.
>
> I believe the debrujin module was turned off a while ago due to a bug that wasn't fixed. However, try using

Was it turned off or was the priority lowered? If it was lowered then -mca routed debruijn should work. The -novm is to avoid the bug (as I understand it). I am working on fixing the bug now in hope it will be ready for 1.7.2.

-Nathan