Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Problem with the openmpi-default-hostfile (on the trunk)
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-02-28 04:54:15


I'll see what I can do when next I have access to a slurm machine - hopefully in a day or two.

Are you sure you are at the top of the trunk? I reviewed the code, and it clearly detects that the default hostile is empty and ignores it if so. Like I said, I'm not seeing this behavior, and neither are the slurm machines on MTT.

On Feb 28, 2012, at 1:25 AM, pascal.deveze_at_[hidden] wrote:

>
> devel-bounces_at_[hidden] a écrit sur 27/02/2012 15:53:06 :
>
> > De : Ralph Castain <rhc_at_[hidden]>
> > A : Open MPI Developers <devel_at_[hidden]>
> > Date : 27/02/2012 16:17
> > Objet : Re: [OMPI devel] Problem with the openmpi-default-hostfile
> > (on the trunk)
> > Envoyé par : devel-bounces_at_[hidden]
> >
> > That's strange - I run on slurm frequently and never have this
> > problem, and my default hostfile is present and empty. Do you have
> > anything in your default mca param file that might be telling us to
> > use the hostfile?
> >
> > The only way I can find to get that behavior is if your default mca
> > param file includes the orte_default_hostfile value. In that case,
> > you are telling us to use the default hostfile, and so we will enforce it.
>
> Hi Ralph,
>
> On my side, the default value of orte_default_hostfile is a pointer to etc/openmpi-default-hostfile.
> The command ompi_info -a gives :
>
> MCA orte: parameter "orte_default_hostfile" (current value: <..../etc/openmpi-default-hostfile>, data source: default value)
> Name of the default hostfile (relative or absolute path, "none" to ignore environmental or default MCA param setting)
>
> The following files are empty:
> - .../etc/openmpi-mca-params.conf
> - $HOME/.openmpi/mca-params.conf
> Another solution for me is to put "orte_default_hostfile=none" in one of these files.
>
> Pascal
>
> >
> > On Feb 27, 2012, at 5:57 AM, pascal.deveze_at_[hidden] wrote:
> >
> > Hi all,
> >
> > I have problems with the openmpi-default-hostfile since the
> > following patch on the trunk
> >
> > changeset: 19874:088fc6c84a9f
> > user: rhc
> > date: Wed Feb 01 17:40:44 2012 +0000
> > summary: In accordance with prior releases, we are supposed to
> > default to looking at the openmpi-default-hostfile as a default
> > hostfile. Restore that behavior, but ignore the file if it is empty.
> > Allow the user to ignore any MCA param setting pointing to a default
> > hostfile by setting the param to "none" (via cmd line or whatever) -
> > this allows them to override a setting in the system default MCA param file.
> >
> > According to the summary of this patch, the openmpi-default-hostfile
> > is ignored if it is empty.
> > But, when I run my jobs with slurm + mpirun, I get the following message:
> > --------------------------------------------------------------------------
> > No nodes are available for this job, either due to a failure to
> > allocate nodes to the job, or allocated nodes being marked
> > as unavailable (e.g., down, rebooting, or a process attempting
> > to be relocated to another node when none are available).
> > --------------------------------------------------------------------------
> >
> > I am able to run my job if:
> > - either I put my node(s) in the file etc/openmpi-default-hostfile
> > - or use "-mca orte_default_hostfile=none" in the mpirun command line
> > - or "export OMPI_MCA_orte_default_hostfile none" in my environment
> >
> > It appears that an empty openmpi-default-hostfile is not ignored.
> > This patch seems not be complete
> >
> > Or do I misunderstand something ?
> >
> > Pascal Devèze_______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel