Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] System V Shared Memory for Open MPI:Request forCommunity Input and Testing
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-05-04 10:27:00


If there's a sleep(1) in the run-time test, that would be an annoying source of delay in the startup of a job. This is not a deal-breaker, but it would be nice(r) if there was a "fast" run-time check that could be checked during the sysv selection logic (i.e., sysv could disqualify itself if the feature is not available at runtime). Keep in mind that the run-time check will be run in parallel across the whole job, so it's (more or less) a constant amount of time that is added to job startup.

One thing to be careful with a run-time check is that you might not want *all* processes on a box to try to alloc a sysv segment, fork a child, try to connect, ...etc. With large count boxen, you might run out of sysv shmem segments if all procs try the test and/or run into OS serialization issues (someone here at the Forum cited a 96 core box). So you might want to have local rank 0 (or the orted? ...but that wouldn't work for srun / direct launch, etc.) do a test and communicate the results to the rest of the local procs -- maybe in the modex?

On May 4, 2010, at 9:14 AM, N.M. Maclaren wrote:

> On May 4 2010, Terry Dontje wrote:
> >Ralph Castain wrote:
> >>
> >>> Is a configure-time test good enough? For example, are all Linuxes
> >>> the same in this regard. That is if you built OMPI on RH and it
> >>> configured in the new SysV SM will those bits actually run on other
> >>> Linux systems correctly? I think Jeff had hinted to this similarly
> >>> when suggesting this may need to be a runtime test.
> >>
> >> I don't think we have ever enforced that requirement, nor am I sure
> >> the current code would meet it. We have a number of components that
> >> test for ability to build, but don't check again at run-time.
> >>
> >> Generally, the project has followed the philosophy of "build on the
> >> system you intend to run on".
> >>
> >There is at least one binary distribution that does build on one linux
> >and allows to be installed on several others. That is the reason I
> >bring up the above. The community can make a stance that that one
> >distribution does not matter for this case or needs to handle it on its
> >own. In the grand scheme of things it might not matter but I wanted to
> >at least stand up and be heard.
>
> There is a gradation involved. Building on one distribution and using
> on another is one thing. But the same distribution can use differently
> built kernels, and the same system can be reconfigured (including both
> package updating and parameter changing). It is highly undesirable to
> use volatile parameters in non-volatile context.
>
> A lot of applications need rebuilding when the administrator updates
> packages or makes configuration changes; that's not good and should be
> avoided if at all possible. Given the way that systems are currently
> configured, and the design of the autoconfigure mechanism, it's probably
> not wholly avoidable. But it's still a very nasty gotcha.
>
>
> Regards,
> Nick Maclaren.
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/