Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r18804
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-07-03 16:50:59


Thanks Ralph, this fix does the trick.

Aurelien

Le 3 juil. 08 à 13:53, rhc_at_[hidden] a écrit :

> Author: rhc
> Date: 2008-07-03 13:53:37 EDT (Thu, 03 Jul 2008)
> New Revision: 18804
> URL: https://svn.open-mpi.org/trac/ompi/changeset/18804
>
> Log:
> Repair the MPI-2 dynamic operations. This includes:
>
> 1. repair of the linear and direct routed modules
>
> 2. repair of the ompi/pubsub/orte module to correctly init routes to
> the ompi-server, and correctly handle failure to correctly parse the
> provided ompi-server URI
>
> 3. modification of orterun to accept both "file" and "FILE" for
> designating where the ompi-server URI is to be found - purely a
> convenience feature
>
> 4. resolution of a message ordering problem during the connect/
> accept handshake that allowed the "send-first" proc to attempt to
> send to the "recv-first" proc before the HNP had actually updated
> its routes.
>
> Let this be a further reminder to all - message ordering is NOT
> guaranteed in the OOB
>
> 5. Repair the ompi/dpm/orte module to correctly init routes during
> connect/accept.
>
> Reminder to all: messages sent to procs in another job family (i.e.,
> started by a different mpirun) are ALWAYS routed through the
> respective HNPs. As per the comments in orte/routed, this is
> REQUIRED to maintain connect/accept (where only the root proc on
> each side is capable of init'ing the routes), allow communication
> between mpirun's using different routing modules, and to minimize
> connections on tools such as ompi-server. It is all taken care of
> "under the covers" by the OOB to ensure that a route back to the
> sender is maintained, even when the different mpirun's are using
> different routed modules.
>
> 6. corrections in the orte/odls to ensure proper identification of
> daemons participating in a dynamic launch
>
> 7. corrections in build/nidmap to support update of an existing
> nidmap during dynamic launch
>
> 8. corrected implementation of the update_arch function in the ESS,
> along with consolidation of a number of ESS operations into base
> functions for easier maintenance. The ability to support info from
> multiple jobs was added, although we don't currently do so - this
> will come later to support further fault recovery strategies
>
> 9. minor updates to several functions to remove unnecessary and/or
> no longer used variables and envar's, add some debugging output, etc.
>
> 10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to
> true if the provided proc is a daemon
>
> There is still more cleanup to be done for efficiency, but this at
> least works.
>
> Tested on single-node Mac, multi-node SLURM via odin. Tests included
> connect/accept, publish/lookup/unpublish, comm_spawn,
> comm_spawn_multiple, and singleton comm_spawn.
>
> Fixes ticket #1256
>
>
>
> Added:
> trunk/orte/mca/ess/base/ess_base_nidmap.c
> Removed:
> trunk/orte/mca/ess/base/ess_base_build_nidmap.c
> Text files modified:
> trunk/ompi/attribute/attribute_predefined.c | 13
> trunk/ompi/mca/dpm/base/base.h | 1
> trunk/ompi/mca/dpm/base/dpm_base_null_fns.c | 5
> trunk/ompi/mca/dpm/base/dpm_base_open.c | 1
> trunk/ompi/mca/dpm/dpm.h | 7
> trunk/ompi/mca/dpm/orte/dpm_orte.c | 494 +++++++
> +++++++++++++++-----------------
> trunk/ompi/mca/pubsub/orte/pubsub_orte.c | 14
> trunk/ompi/proc/proc.c | 1
> trunk/orte/mca/ess/alps/ess_alps_module.c | 163 ++++
> +--------
> trunk/orte/mca/ess/base/Makefile.am | 2
> trunk/orte/mca/ess/base/base.h | 12
> trunk/orte/mca/ess/base/ess_base_get.c | 9
> trunk/orte/mca/ess/base/ess_base_put.c | 8
> trunk/orte/mca/ess/env/ess_env_module.c | 144 ++++
> +------
> trunk/orte/mca/ess/hnp/ess_hnp_module.c | 2
> trunk/orte/mca/ess/lsf/ess_lsf_module.c | 138 ++++
> +-----
> trunk/orte/mca/ess/singleton/ess_singleton_module.c | 182 ++++++
> +------
> trunk/orte/mca/ess/slurm/ess_slurm_module.c | 136 ++++
> +-----
> trunk/orte/mca/ess/tool/ess_tool_module.c | 2
> trunk/orte/mca/grpcomm/bad/grpcomm_bad_module.c | 22 +
> trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c | 13
> trunk/orte/mca/odls/base/odls_base_default_fns.c | 52 ++--
> trunk/orte/mca/odls/base/odls_base_open.c | 8
> trunk/orte/mca/odls/base/odls_private.h | 4
> trunk/orte/mca/rml/base/rml_base_receive.c | 21 +
> trunk/orte/mca/rml/rml_types.h | 2
> trunk/orte/mca/routed/binomial/routed_binomial.c | 192 +++++++
> ++++++--
> trunk/orte/mca/routed/direct/routed_direct.c | 316 +++++++
> +++++++++++------
> trunk/orte/mca/routed/linear/routed_linear.c | 198 +++++++
> ++++++--
> trunk/orte/runtime/orte_globals.h | 15 +
> trunk/orte/runtime/orte_globals_class_instances.h | 51 ++++
> trunk/orte/test/mpi/accept.c | 1
> trunk/orte/tools/orterun/orterun.c | 3
> trunk/orte/util/name_fns.h | 4
> trunk/orte/util/nidmap.c | 105 ++++----
> trunk/orte/util/nidmap.h | 3
> trunk/orte/util/proc_info.c | 10
> trunk/orte/util/proc_info.h | 5
> 38 files changed, 1443 insertions(+), 916 deletions(-)
>
>
> Diff not shown due to size (154082 bytes).
> To see the diff, run the following command:
>
> svn diff -r 18803:18804 --no-diff-deleted
>
> _______________________________________________
> svn mailing list
> svn_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/svn