Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] btl tcp port to ...
From: Muhammad Atif (m_atif_s_at_[hidden])
Date: 2008-01-17 09:27:44

Hi, Thanks a lot for the reply. You have understood my problem correctly but I am unable to comprehend your solution or suggestion where to look into . The btl_size is shown as 12 and size as 8. But my understanding of mca_btl_tcp_comonent_exchange function is a touch different or perhaps wrong, so please correct me if I am wrong. Once we do the exchange i.e. mca_btl_tcp_component_exchange:(), the size is calculated as size_t size = mca_btl_tcp_component.tcp_num_btls * sizeof(mca_btl_tcp_addr_t); This is giving me correct size. I have only one tcp_num_btls, therefore size is given as 12. Now we allocate memory by mca_btl_tcp_addr_t *addrs = (mca_btl_tcp_addr_t *)malloc(size) As size is 12, hence it gives me the correct allocation. And lastly rc = mca_pml_base_modex_send(&mca_btl_tcp_component.super.btl_version, addrs, size); This sends addrs with the size 12. Should not that work out of the box? Or are there more things attached which are not transparent? Can you please give me some more explanation of this statement.... which I think holds the key to my solution, but I am not able to comprehend correctly. "We copy the information to be sent into the addrs array and increase xfer_size afterwards (telling the function how many bytes to be transferred)." Where exactly are we increasing the size? Best Regards, Muhammad Atif ----- Original Message ---- From: Adrian Knoth <adi_at_[hidden]> To: Open MPI Developers <devel_at_[hidden]> Sent: Thursday, January 17, 2008 11:43:24 PM Subject: Re: [OMPI devel] btl tcp port to xensocket On Tue, Jan 15, 2008 at 04:07:02PM -0800, Muhammad Atif wrote: > Just for reference, I am trying to port btl/tcp to xensockets. Now if > i want to do modex send/recv , to my understanding, mca_btl_tcp_addr_t > is used (ref code/function is mca_btl_tcp_component_exchange). For > xensockets, I need to send only one additional integer remote_domU_id > across to say all the peers (in refined code it would be specific to > each domain, but i just want to have clear understanding before i move > any further). Now I have changed the struct mca_btl_tcp_addr_t present > in btl_tcp_addr.h and have added int r_domu_id. This makes the size of > structure 12. Upon receive mca_btl_tcp_proc_create() gives an error > after mca_pml_base_modex_recv() and at this statement if(0 != (size % > sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still > expecting size 8, where as i have made the size 12. I am unable to > pin point the exact location where the size 8 is still embedded. Any > ideas? Just an idea: the mca_base_modex_recv error gives you this error: BTL_ERROR(("mca_base_modex_recv: invalid size %d: btl-size: %d\n", size, sizeof(mca_btl_tcp_addr_t))); So what is wrong? Is btl-size shown as 12 or as 8? It should be 12. And is size just 8? So this means you forgot to include your new socket in your modex_send_request. See mca_btl_tcp_component_exchange: We copy the information to be sent into the addrs array and increase xfer_size afterwards (telling the function how many bytes to be transferred). Perhaps you missed something there. -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: _______________________________________________ devel mailing list devel_at_[hidden] ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search.