Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] Trunk is broken
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-12-24 13:39:53

Hi folks

This is a heads-up to all: It appears a recent commit has broken the trunk - I think it relates to something done to the MCA parameter system. When running across multiple nodes, the daemons segfault on finalize with a stacktrace of:

(gdb) where
#0 0x0000003dc4477e92 in _int_free () from /lib64/
#1 0x00007f18a163f756 in param_destructor (p=0x118d940) at mca_base_param.c:1982
#2 0x00007f18a163ab41 in opal_obj_run_destructors (object=0x118d940) at ../../../opal/class/opal_object.h:448
#3 0x00007f18a163cb94 in mca_base_param_finalize () at mca_base_param.c:853
#4 0x00007f18a1609c06 in opal_finalize_util () at runtime/opal_finalize.c:69
#5 0x00007f18a1609cbc in opal_finalize () at runtime/opal_finalize.c:155
#6 0x00007f18a18e366b in orte_finalize () at runtime/orte_finalize.c:107
#7 0x00007f18a1911313 in orte_daemon (argc=35, argv=0x7ffffd7ea8b8) at orted/orted_main.c:834
#8 0x000000000040091a in main (argc=35, argv=0x7ffffd7ea8b8) at orted.c:62
(gdb) up
#1 0x00007f18a163f756 in param_destructor (p=0x118d940) at mca_base_param.c:1982
1982 free(p->mbp_env_var_name);

gdb) print array[i]
$2 = {mbp_super = {obj_magic_id = 0, obj_class = 0x7f18a18c6460, obj_reference_count = 1, cls_init_file_name = 0x7f18a169d04e "mca_base_param.c",
    cls_init_lineno = 1154}, mbp_type = MCA_BASE_PARAM_TYPE_STRING, mbp_type_name = 0x1185110 "\300O\030\001", mbp_component_name = 0x0,
  mbp_param_name = 0x1185130 "", mbp_full_name = 0x1185150 "orte_debugger_test_daemon", mbp_synonyms = 0x0, mbp_internal = false,
  mbp_read_only = false, mbp_deprecated = false, mbp_deprecated_warning_shown = true,
  mbp_help_msg = 0x11850a0 "Name of the executable to be used to simulate a debugger colaunch (relative or absolute path)",
  mbp_env_var_name = 0x1185180 "\020P\030\001", mbp_default_value = {intval = 0, stringval = 0x0}, mbp_file_value_set = false, mbp_file_value = {
    intval = 0, stringval = 0x0}, mbp_source_file = 0x0, mbp_override_value_set = false, mbp_override_value = {intval = 0, stringval = 0x0}}

As you can see, the problem is that the mbp_env_var_name field is trash, so the destructor's attempt to free that field crashes.

I believe it was Nathan that last touched this area, so perhaps he could take a gander and see what happened? Meantime, I'm afraid the trunk is down.