Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
From: Gilles Gouaillardet (gilles.gouaillardet_at_[hidden])
Date: 2014-07-30 11:50:08


Ralph,

was it really that simple ?

proc_temp->super.proc_name has type opal_process_name_t :
typedef opal_identifier_t opal_process_name_t;
typedef uint64_t opal_identifier_t;

*but*

item_ptr->peer has type orte_process_name_t :
struct orte_process_name_t {
   orte_jobid_t jobid;
   orte_vpid_t vpid;
};

bottom line, is r32357 still valid on a big endian arch ?

Cheers,

Gilles

On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I just fixed this one - all that was required was an ampersand as the name
> was being passed into the function instead of a pointer to the name
>
> r32357
>
> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
> gilles.gouaillardet_at_[hidden]> wrote:
>
> Rolf,
>
> r32353 can be seen as a suspect...
> Even if it is correct, it might have exposed the bug discussed in #4815
> even more (e.g. we hit the bug 100% after the fix)
>
> does the attached patch to #4815 fixes the problem ?
>
> If yes, and if you see this issue as a showstopper, feel free to commit it
> and drop a note to #4815
> ( I am afk until tomorrow)
>
> Cheers,
>
> Gilles
>
> Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:
>
> Just an FYI that my trunk version (r32355) does not work at all anymore if
> I do not include "--mca coll ^ml". Here is a stack trace from the
> ibm/pt2pt/send test running on a single node.
>
>
>
> (gdb) where
>
> #0 0x00007f6c0d1321d0 in ?? ()
>
> #1 <signal handler called>
>
> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> #3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
> back_files=0x7f6bf3ffd6c8,
>
> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_",
> map_all=false) at
> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
> #4 0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti
> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
> reg_data=0xba28c0)
>
> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
> #5 0x00007f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40)
> at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
> #6 0x00007f6c0cced68f in ml_module_memory_initialization
> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
> #7 0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
> #8 0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
> priority=0x7fffe7991b58) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
> #9 0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0,
> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0,
> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
> requested=0, provided=0x7fffe79922e8) at
> ../../ompi/runtime/ompi_mpi_init.c:918
>
> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
> argv=0x7fffe7992340) at pinit.c:84
>
> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
> (gdb) up
>
> #1 <signal handler called>
>
> (gdb) up
>
> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> 522 if (name1->jobid < name2->jobid) {
>
> (gdb) print name1
>
> $1 = (const orte_process_name_t *) 0x192350001
>
> (gdb) print *name1
>
> Cannot access memory at address 0x192350001
>
> (gdb) print name2
>
> $2 = (const orte_process_name_t *) 0xbaf76c
>
> (gdb) print *name2
>
> $3 = {jobid = 2452946945, vpid = 1}
>
> (gdb)
>
>
>
>
>
>
>
> >-----Original Message-----
>
> >From: devel [mailto:devel-bounces_at_[hidden]
> <devel-bounces_at_[hidden]>] On Behalf Of Gilles
>
> >Gouaillardet
>
> >Sent: Wednesday, July 30, 2014 2:16 AM
>
> >To: Open MPI Developers
>
> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>
> >
>
> >George,
>
> >
>
> >#4815 is indirectly related to the move :
>
> >
>
> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>
> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>
> >(which causes a glory SIGSEGV)
>
> >
>
> >i proposed a temporary patch which is both broken and unelegant, could you
>
> >please advise a correct solution ?
>
> >
>
> >Cheers,
>
> >
>
> >Gilles
>
> >
>
> >On 2014/07/27 7:37, George Bosilca wrote:
>
> >> If you have any issue with the move, I’ll be happy to help and/or
> support
>
> >you on your last move toward a completely generic BTL. To facilitate your
>
> >work I exposed a minimalistic set of OMPI information at the OPAL level.
> Take
>
> >a look at opal/util/proc.h for more info, but please try not to expose
> more.
>
> >
>
> >_______________________________________________
>
> >devel mailing list
>
> >devel_at_[hidden]
>
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> >Link to this post: http://www.open-
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>
> >mpi.org/community/lists/devel/2014/07/15348.php
> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
> ------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information. Any unauthorized review, use,
> disclosure or distribution is prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>