Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-07-30 12:29:29


The underlying structure changed, so a little bit of fiddling is normal.
Instead of using a field in the ompi_proc_t you are now using a field down
in opal_proc_t, a field that simply cannot have the same type as before
(orte_process_name_t).

  George.

On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> George - my point was that we regularly tested using the method in that
> routine, and now we have to do something a little different. So it is an
> "issue" in that we have to make changes across the code base to ensure we
> do things the "new" way, that's all
>
> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>
> No, this is not going to be an issue if the opal_identifier_t is used
> correctly (aka only via the exposed accessors).
>
> George.
>
>
>
> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Yeah, my fix won't work for big endian machines - this is going to be an
>> issue across the code base now, so we'll have to troll and fix it. I was
>> doing the minimal change required to fix the trunk in the meantime.
>>
>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>>
>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>> bits storage location used by the upper layer to save some local key that
>> can be later used to extract information. Calling the OPAL level compare
>> function might be a better fit there.
>>
>> George.
>>
>>
>>
>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <
>> gilles.gouaillardet_at_[hidden]> wrote:
>>
>>> Ralph,
>>>
>>> was it really that simple ?
>>>
>>> proc_temp->super.proc_name has type opal_process_name_t :
>>> typedef opal_identifier_t opal_process_name_t;
>>> typedef uint64_t opal_identifier_t;
>>>
>>> *but*
>>>
>>> item_ptr->peer has type orte_process_name_t :
>>> struct orte_process_name_t {
>>> orte_jobid_t jobid;
>>> orte_vpid_t vpid;
>>> };
>>>
>>> bottom line, is r32357 still valid on a big endian arch ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <rhc_at_[hidden]>
>>> wrote:
>>>
>>>> I just fixed this one - all that was required was an ampersand as the
>>>> name was being passed into the function instead of a pointer to the name
>>>>
>>>> r32357
>>>>
>>>> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
>>>> gilles.gouaillardet_at_[hidden]> wrote:
>>>>
>>>> Rolf,
>>>>
>>>> r32353 can be seen as a suspect...
>>>> Even if it is correct, it might have exposed the bug discussed in #4815
>>>> even more (e.g. we hit the bug 100% after the fix)
>>>>
>>>> does the attached patch to #4815 fixes the problem ?
>>>>
>>>> If yes, and if you see this issue as a showstopper, feel free to commit
>>>> it and drop a note to #4815
>>>> ( I am afk until tomorrow)
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:
>>>>
>>>> Just an FYI that my trunk version (r32355) does not work at all anymore
>>>> if I do not include "--mca coll ^ml". Here is a stack trace from the
>>>> ibm/pt2pt/send test running on a single node.
>>>>
>>>>
>>>>
>>>> (gdb) where
>>>>
>>>> #0 0x00007f6c0d1321d0 in ?? ()
>>>>
>>>> #1 <signal handler called>
>>>>
>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>>>
>>>> #3 0x00007f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
>>>> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
>>>> back_files=0x7f6bf3ffd6c8,
>>>>
>>>> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606
>>>> "sm_payload_mem_", map_all=false) at
>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>>>>
>>>> #4 0x00007f6c0be98307 in bcol_basesmuma_bank_init_opti
>>>> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
>>>> reg_data=0xba28c0)
>>>>
>>>> at
>>>> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>>>>
>>>> #5 0x00007f6c0cced386 in mca_coll_ml_register_bcols
>>>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>>>>
>>>> #6 0x00007f6c0cced68f in ml_module_memory_initialization
>>>> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>>>>
>>>> #7 0x00007f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>>>>
>>>> #8 0x00007f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
>>>> priority=0x7fffe7991b58) at
>>>> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>>>>
>>>> #9 0x00007f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>
>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>>>>
>>>> #10 0x00007f6c18cc5ac8 in query (component=0x7f6c0cf50940,
>>>> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>>>>
>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>>>>
>>>> #11 0x00007f6c18cc59d2 in check_one_component (comm=0x6037a0,
>>>> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>>>>
>>>> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>>>>
>>>> #12 0x00007f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
>>>> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>>>>
>>>> #13 0x00007f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
>>>> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>>>>
>>>> #14 0x00007f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
>>>> requested=0, provided=0x7fffe79922e8) at
>>>> ../../ompi/runtime/ompi_mpi_init.c:918
>>>>
>>>> #15 0x00007f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
>>>> argv=0x7fffe7992340) at pinit.c:84
>>>>
>>>> #16 0x0000000000401056 in main (argc=1, argv=0x7fffe79924c8) at
>>>> send.c:32
>>>>
>>>> (gdb) up
>>>>
>>>> #1 <signal handler called>
>>>>
>>>> (gdb) up
>>>>
>>>> #2 0x00007f6c183abd52 in orte_util_compare_name_fields (fields=15
>>>> '\017', name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>>>>
>>>> 522 if (name1->jobid < name2->jobid) {
>>>>
>>>> (gdb) print name1
>>>>
>>>> $1 = (const orte_process_name_t *) 0x192350001
>>>>
>>>> (gdb) print *name1
>>>>
>>>> Cannot access memory at address 0x192350001
>>>>
>>>> (gdb) print name2
>>>>
>>>> $2 = (const orte_process_name_t *) 0xbaf76c
>>>>
>>>> (gdb) print *name2
>>>>
>>>> $3 = {jobid = 2452946945, vpid = 1}
>>>>
>>>> (gdb)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> >-----Original Message-----
>>>>
>>>> >From: devel [mailto:devel-bounces_at_[hidden]
>>>> <devel-bounces_at_[hidden]>] On Behalf Of Gilles
>>>>
>>>> >Gouaillardet
>>>>
>>>> >Sent: Wednesday, July 30, 2014 2:16 AM
>>>>
>>>> >To: Open MPI Developers
>>>>
>>>> >Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>>>>
>>>> >
>>>>
>>>> >George,
>>>>
>>>> >
>>>>
>>>> >#4815 is indirectly related to the move :
>>>>
>>>> >
>>>>
>>>> >in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>>>>
>>>> >we (try to) compare an ompi_process_name_t and an opal_process_name_t
>>>>
>>>> >(which causes a glory SIGSEGV)
>>>>
>>>> >
>>>>
>>>> >i proposed a temporary patch which is both broken and unelegant, could
>>>> you
>>>>
>>>> >please advise a correct solution ?
>>>>
>>>> >
>>>>
>>>> >Cheers,
>>>>
>>>> >
>>>>
>>>> >Gilles
>>>>
>>>> >
>>>>
>>>> >On 2014/07/27 7:37, George Bosilca wrote:
>>>>
>>>> >> If you have any issue with the move, I’ll be happy to help and/or
>>>> support
>>>>
>>>> >you on your last move toward a completely generic BTL. To facilitate
>>>> your
>>>>
>>>> >work I exposed a minimalistic set of OMPI information at the OPAL
>>>> level. Take
>>>>
>>>> >a look at opal/util/proc.h for more info, but please try not to expose
>>>> more.
>>>>
>>>> >
>>>>
>>>> >_______________________________________________
>>>>
>>>> >devel mailing list
>>>>
>>>> >devel_at_[hidden]
>>>>
>>>> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> >Link to this post: http://www.open-
>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>>
>>>> >mpi.org/community/lists/devel/2014/07/15348.php
>>>> <http://www.open-mpi.org/community/lists/devel/2014/07/15348.php>
>>>> ------------------------------
>>>> This email message is for the sole use of the intended recipient(s)
>>>> and may contain confidential information. Any unauthorized review, use,
>>>> disclosure or distribution is prohibited. If you are not the intended
>>>> recipient, please contact the sender by reply email and destroy all copies
>>>> of the original message.
>>>> ------------------------------
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15355.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15356.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15363.php
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15364.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15366.php
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15367.php
>