On Fri, 2010-01-22 at 08:12 -0700, Ralph Castain wrote:
> For SLURM, there is a config file where you can specify what gets propagated. It is clearly an error to include hostname as it messes many things up, not just OMPI. Frankly, I've never seen someone do that on SLURM.
I'm going to check that.
> I believe in this case OMPI is likely incorrectly picking up the environment and propagating it. We know this is incorrectly happening on Torque, and it appears to also be happening on SLURM. This is a bug that I will be fixing on Torque - and as soon as Nadia confirms, on SLURM as well.
> I know that on Torque it was an innocent mistake where a line got added to the launch code that shouldn't have...
> On Jan 22, 2010, at 8:07 AM, N.M. Maclaren wrote:
> > On Jan 22 2010, Nadia Derbey wrote:
> >> I'm wondering whether the HOSTNAME environment variable shouldn't be
> >> handled as a "special case" when the orted daemons launch the remote
> >> jobs. This particularly applies to batch schedulers where the caller's
> >> environment is copied to the remote job: we are inheriting a $HOSTNAME
> >> which is the name of the host mpirun was called from:
> > This is slightly orthogonal, but relevant.
> > This is an ancient mess with propagating environment variables, and predates
> > MPI by many years. The most traditional form was the demented connexion
> > protocols that propagated TERM - truly wonderful when logging in from SunOS
> > to HP-UX! Whether it is worth kludging up one variable and leaving the rest
> > is unclear.
> > Even if systems are fairly homogeneous, it is common for the head node to
> > have a different set of standard values from the others. TMPDIR is one
> > very common one, but any of the dozen of so path variables is likely to
> > vary, at least sometimes, as are many of the others.
> > I used to have to write the most DISGUSTING hacks to stop unwanted export
> > when I managed our supercomputer. Yet there are other systems that will
> > work only if you DO export environment variables. And there are systems
> > where the secondary nodes aren't real systems, and using the parent hostname
> > would be better, though I haven't managed any.
> > Realistically, there should really be some kind of hook to control which
> > are transferred and which are not. I haven't found one - if there is, it's
> > a better way to tackle this.
> > Regards,
> > Nick Maclaren.
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> devel mailing list
Nadia Derbey <Nadia.Derbey_at_[hidden]>