Yeah, my fix won't work for big endian machines - this is going to be an issue across the code base now, so we'll have to troll and fix it. I was doing the minimal change required to fix the trunk in the meantime.

On Jul 30, 2014, at 9:06 AM, George Bosilca <> wrote:

Yes. opal_process_name_t has basically no meaning by itself, it is a 64 bits storage location used by the upper layer to save some local key that can be later used to extract information. Calling the OPAL level compare function might be a better fit there.


On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <> wrote:

was it really that simple ?

proc_temp->super.proc_name has type opal_process_name_t :
typedef opal_identifier_t opal_process_name_t;
typedef uint64_t opal_identifier_t;


item_ptr->peer has type orte_process_name_t :
struct orte_process_name_t {
   orte_jobid_t jobid;
   orte_vpid_t vpid;

bottom line, is r32357 still valid on a big endian arch ?



On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <> wrote:
I just fixed this one - all that was required was an ampersand as the name was being passed into the function instead of a pointer to the name


On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <> wrote:


r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and drop a note to #4815
( I am afk until tomorrow)



Rolf vandeVaart <> wrote:

Just an FYI that my trunk version (r32355) does not work at all anymore if I do not include "--mca coll ^ml".    Here is a stack trace from the ibm/pt2pt/send test running on a single node.


(gdb) where

#0  0x00007f6c0d1321d0 in ?? ()

#1  <signal handler called>

#2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

#3  0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, back_files=0x7f6bf3ffd6c8,

    comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", map_all=false) at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237

#4  0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, reg_data=0xba28c0)

    at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302

#5  0x00007f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510

#6  0x00007f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558

#7  0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539

#8  0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, priority=0x7fffe7991b58) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963

#9  0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)

    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372

#10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)

    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355

#11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0, component=0x7f6c0cf50940, module=0x7fffe7991b90)

    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317

#12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0, comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281

#13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117

#14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918

#15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) at pinit.c:84

#16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32

(gdb) up

#1  <signal handler called>

(gdb) up

#2  0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522

522           if (name1->jobid < name2->jobid) {

(gdb) print name1

$1 = (const orte_process_name_t *) 0x192350001

(gdb) print *name1

Cannot access memory at address 0x192350001

(gdb) print name2

$2 = (const orte_process_name_t *) 0xbaf76c

(gdb) print *name2

$3 = {jobid = 2452946945, vpid = 1}





>-----Original Message-----

>From: devel [] On Behalf Of Gilles


>Sent: Wednesday, July 30, 2014 2:16 AM

>To: Open MPI Developers

>Subject: Re: [OMPI devel] trunk compilation errors in jenkins




>#4815 is indirectly related to the move :


>in bcol/basesmuma, we used to compare ompi_process_name_t, and now

>we (try to) compare an ompi_process_name_t and an opal_process_name_t

>(which causes a glory SIGSEGV)


>i proposed a temporary patch which is both broken and unelegant, could you

>please advise a correct solution ?






>On 2014/07/27 7:37, George Bosilca wrote:

>> If you have any issue with the move, Iíll be happy to help and/or support

>you on your last move toward a completely generic BTL. To facilitate your

>work I exposed a minimalistic set of OMPI information at the OPAL level. Take

>a look at opal/util/proc.h for more info, but please try not to expose more.



>devel mailing list



>Link to this post:


This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

devel mailing list
Link to this post:

devel mailing list
Link to this post:

devel mailing list
Link to this post:

devel mailing list
Link to this post: