Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk compilation errors in jenkins
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-07-26 18:37:11


All,

I take advantage of this thread to clarify what is missing to have a perfectly MPI agnostic BTL interface. Some of these issues are pretty straightforward (getting rid of RTE and OMPI vestiges), some others will require some thinking from their developers in order to cope with a not conformant design (such as using MPI_COMM_WORLD in the BTL). So, here is an exhaustive list:

- Open IB uses quite a few ORTE internals: orte_proc_is_bound
- also it makes usage of some functions/define that I can’t find anywhere in the code base ompi_progress_threads

- UGNI uses MPI_COMM_WORLD for internal management
- USNIC uses num_procs for internal management. It also directly calls ompi_rte_abort
- common OFACM uses the num_procs to hash table allocation

Two items are of general interest as they affect our compatibility with past installations/usages:
- MPOOL alloc uses MPI level info keys …
- most of the BTL MCA parameters have not been renamed (!!!). Personally, I would be in favor of creating synonyms for now and then deprecate the OMPI version in 2.0, but I don’t want to enforce this on everybody. So, the discussion is open on this topic.

Ralph and Jeff (I think you added the seq interface to TCP), please take a look at the following:
- the implementation of the TCP seq interface seems to be wrong: it used the my_node_rank to compute the sequence number instead of the my_local_rank (I changed this to my_local_rank)

If you have any issue with the move, I’ll be happy to help and/or support you on your last move toward a completely generic BTL. To facilitate your work I exposed a minimalistic set of OMPI information at the OPAL level. Take a look at opal/util/proc.h for more info, but please try not to expose more.

  Thanks,
    George.

On Jul 26, 2014, at 02:22 , Ralph Castain <rhc_at_[hidden]> wrote:

> That's because you folks didn't completely cleanup the open fabrics stuff prior to the move - something that we warned about, but folks said they would resolve later :-)
>
> On Jul 25, 2014, at 11:19 PM, Mike Dubman <miked_at_[hidden]> wrote:
>
>> Making all in mca/common/ofacm
>> make[2]: Entering directory `/hpc/local/benchmarks/hpc-stack-gcc/src/install/ompi-master/opal/mca/common/ofacm'
>> CC libmca_common_ofacm_la-common_ofacm_base.lo
>> CC libmca_common_ofacm_la-common_ofacm_oob.lo
>> CC libmca_common_ofacm_la-common_ofacm_empty.lo
>> LN_S libmca_common_ofacm.la
>> common_ofacm_oob.c: In function 'oob_component_query':
>> common_ofacm_oob.c:178: warning: passing argument 4 of 'orte_rml.recv_buffer_nb' from incompatible pointer type
>> common_ofacm_oob.c:178: note: expected 'orte_rml_buffer_callback_fn_t' but argument is of type 'void (*)(int, opal_process_name_t *, struct opal_buffer_t *, ompi_rml_tag_t, void *)'
>> common_ofacm_xoob.c: In function 'xoob_context_init':
>> common_ofacm_xoob.c:354: error: request for member 'jobid' in something not a structure or union
>> common_ofacm_xoob.c: In function 'xoob_endpoint_fina
>> common_ofacm_oob.c:728: warning: passing argument 4 of 'orte_rml.send_buffer_nb' from incompatible pointer type
>> common_ofacm_oob.c:728: note: expected 'orte_rml_buffer_callback_fn_t' but argument is of type 'void (*)(int, opal_process_name_t *, struct opal_buffer_t *, ompi_rml_tag_t, void *)'
>> common_ofacm_xoob.c: In function 'xoob_send_connect_data':
>> common_ofacm_xoob.c:791: warning: passing argument 1 of 'orte_rml.send_buffer_nb' from incompatible pointer type
>> common_ofacm_xoob.c:791: note: expected 'struct orte_process_name_t *' but argument is of type 'opal_process_name_t *'
>> common_ofacm_xoob.c:791: warning: passing argument 4 of 'orte_rml.send_buffer_nb' from incompatible pointer type
>> common_ofacm_xoob.c:791: note: expected 'orte_rml_buffer_callback_fn_t' but argument is of type 'void (*)(int, opal_process_name_t *, struct opal_buffer_t *, ompi_rml_tag_t, void *)'
>> common_ofacm_xoob.c: In function 'xoob_recv_qp_create':
>> common_ofacm_xoob.c:963: warning: 'ibv_create_xrc_rcv_qp' is deprecated (declared at /usr/include/infiniband/ofa_verbs.h:126)
>> common_ofacm_xoob.c:983: warning: 'ibv_modify_xrc_rcv_qp' is deprecated (declared at /usr/include/infiniband/ofa_verbs.h:152)
>> common_ofacm_xoob.c:1011: warning: 'ibv_modify_xrc_rcv_qp' is deprecated (declared at /usr/include/infiniband/ofa_verbs.h:152)
>> common_ofacm_xoob.c: In function 'xoob_recv_qp_connect':
>> common_ofacm_xoob.c:1032: warning: 'ibv_reg_xrc_rcv_qp' is deprecated (declared at /usr/include/infiniband/ofa_verbs.h:185)
>> common_ofacm_xoob.c: In function 'xoob_component_query':
>> common_ofacm_xoob.c:1407: warning: passing argument 4 of 'orte_rml.recv_buffer_nb' from incompatible pointer type
>> common_ofacm_xoob.c:1407: note: expected 'orte_rml_buffer_callback_fn_t' but argument is of type 'void (*)(int, opal_process_name_t *, struct opal_buffer_t *, ompi_rml_tag_t, void *)'
>> make[2]: *** [libmca_common_ofacm_la-common_ofacm_xoob.lo] Error 1
>> make[2]: *** Waiting for unfinished jobs....
>> make[2]: Leaving directory `/hpc/local/benchmarks/hpc-stack-gcc/src/install/ompi-master/opal/mca/common/ofacm'
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15271.php
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15272.php