Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] btl tcp port to xensocket
From: Muhammad Atif (m_atif_s_at_[hidden])
Date: 2008-01-17 19:08:16


Thanks again. Nope.. at the moment I am doing the lame stuff i.e. simply changing the tcp code. So I have not created another btl component. I know its not recommended thing, but I just wanted to try before committing. Apart from xensocket specific stuff, all what I have done inside the btl/tcp code is to change the structure struct mca_btl_tcp_addr_t { struct in_addr addr_inet; /**< IPv4 address in network byte order */ in_port_t addr_port; /**< listen port */ unsigned short addr_inuse; /**< local meaning only */ int xs_domU_ref; /**<xs: domU memory reference */ }; I wanted this structure to be passed on to all peers through component exchange (modex send/recv). This way I have the normal socket listen port, its address and xensocket memory reference (its not complete as it is missing some other info, but lets stick to basic stuff). The second question is regarding btl tcp recv. I have seen a couple of emails with some explanation specific to that particular user but cannot seem to answer this question (ref to previous email). Best Regards, Muhammad Atif PS: I would love if you do some explanation of modex recv as well. ;) Thanks for all the support you guys are giving. ----- Original Message ---- From: Jeff Squyres <jsquyres_at_[hidden]> To: Open MPI Developers <devel_at_[hidden]> Sent: Friday, January 18, 2008 1:42:41 AM Subject: Re: [OMPI devel] btl tcp port to xensocket On Jan 15, 2008, at 6:07 PM, Muhammad Atif wrote: > Just for reference, I am trying to port btl/tcp to xensockets. Now > if i want to do modex send/recv , to my understanding, > mca_btl_tcp_addr_t is used (ref code/function is > mca_btl_tcp_component_exchange). For xensockets, I need to send only > one additional integer remote_domU_id across to say all the peers > (in refined code it would be specific to each domain, but i just > want to have clear understanding before i move any further). Now I > have changed the struct mca_btl_tcp_addr_t present in btl_tcp_addr.h > and have added int r_domu_id. This makes the size of structure 12. > Upon receive mca_btl_tcp_proc_create() gives an error after > mca_pml_base_modex_recv() and at this statement if(0 != (size % > sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still > expecting size 8, where as i have made the size 12. I am unable to > pin point the exact location where the size 8 is still embedded. Any > ideas? Just to be clear -- you have copied the tcp btl to another new name and are modifying that, right? E.g., ompi/mca/btl/xensocket? If so, you need to modify the prefix of all the symbols to be btl_xensocket, and ensure to change the string name of your component in the component sturcture. The modex indexes off this string name, so it's important that it doesn't share a name with any other component in the framework. > Second question is regarding the receive part of openmpi. In my > understanding, once Recv api is called, the control goes through PML > layer and everything initializes there. However, I am unable to get > a lock at the layer/file/function where the receive socket polling > is done. There are callbacks, but where or how exactly the openMPI > knows that message has in fact arrived. Any pointer will do :) Which receive are you asking about here -- BTL receive or the modex receive? > > > Best Regards, > Muhammad Atif > PS: Sorry if my questions are too basic. > > ----- Original Message ---- > From: Jeff Squyres <jsquyres_at_[hidden]> > To: Open MPI Developers <devel_at_[hidden]> > Sent: Friday, January 11, 2008 1:02:31 PM > Subject: Re: [OMPI devel] btl tcp port to xensocket > > > On Jan 10, 2008, at 8:40 PM, Muhammad Atif wrote: > > > Hi, > > Thanks for such a detailed reply. You are right, we have partitioned > > (normalized) our system with Xen and have seen that virtualization > > overhead is not that great (for some applications) as compared to > > potential benefits that we can get. We have executed various > > benchmarks on different network/cluster configuration of Xen and > > Native linux and they are really encouraging. The only known problem > > is inter-domain communication of Xen which is quite poor (1/6 of the > > native memory transfer and not to mention 50%CPU utilization of > > host). We have tested out Xensocket, and these sockets give us > > really good performance boost in all respects. > > Now that I am having a look at the complex yet wonderful > > architecture of openmpi, can you guys give me some guidance on > > couple of naive questions? > > > > 1- How do I view the console output of an mpi process which is not > > at headnode? Do I have to have some parallel debugger? Or is there > > any magical technique? > > OMPI's run-time environment takes care of redirection stdout/stderr > from each MPI process to the stdout/stderr of mpirun for you (this is > another use of the "out of band" TCP channel that is setup between > mpirun and all the MPI processes). > > > > > 2- How do i setup GPR? > > You don't. The GPR is automatically instantiated in mpirun upon > startup. > > > say i have a struct foo, and all processes have at least one such > > instance of foo. From what I gather, openmpi will create a linked > > list of foo's that were passed on (though I am unable to pass one > > on). Where do i have to define struct foo so that it can be > > exchanged b/w the processes? I know its a lame question, but I think > > i am getting lost in the sea. :( > > I assume you're asking about the modex. > > Every BTL defines its own data that is passed around in the modex. It > is assumed that only modules of the same BTL type will be able to > read/ > understand that data. The modex just treats the data as a blob; all > the modex blobs are gathered into mpirun and then broadcast out to > every MPI process (I said scatter in my previous mail; broadcast is > more accurate). > > So when you modex_send, you just pass a pointer to a chunk of memory > and a length (e.g., a pointer to a struct instance and a length). > When you modex_receive, you can just dereference the blob that you > return as the same struct type as you modex_send'ed previously > (because you can only receive blobs from BTL modules that are the same > type as you, and therefore the data they sent is the same type of data > that you sent). > > You can do more complex things in the modex if you need to, of > course. For example, we're changing the openib BTL to send variable > length data in the modex, but that requires a bit more setup and I > suspect you don't need to do this. > > > > > Best Regards, > > Muhammad Atif > > PS: I am totally new to MPI internals. So if at all we decide to go > > ahead with the project, I would be regular bugger in the list. > > That's what we're here for. We don't always reply immediately, but we > try. :-) > > > > > ----- Original Message ---- > > From: Adrian Knoth <adi_at_[hidden]> > > To: Open MPI Developers <devel_at_[hidden]> > > Sent: Thursday, January 10, 2008 1:24:01 AM > > Subject: Re: [OMPI devel] btl tcp port to xensocket > > > > On Tue, Jan 08, 2008 at 10:51:45PM -0800, Muhammad Atif wrote: > > > > > I am planning to port tcp component to xensocket, which is a fast > > > interdomain communication mechanism for guest domains in Xen. I > may > > > > Just to get things right: You first partition your SMP/Multicore > > system > > with Xen, and then want to re-combine it later for MPI > communication? > > > > Wouldn't it be easier to leave the unpartitioned host alone and use > > shared memory communication instead? > > > > > As per design, and the fact that these sockets are not normal > > sockets, > > > I have to pass certain information (basically memory references, > > guest > > > domain info etc) to other peers once sockets have been created. I > > > > There's ORTE, the runtime environment. It employs OOB/tcp to have > a so > > called out-of-band channel. ORTE also provides a general purpose > > registry (GPR). > > > > Once a TCP connection between the headnode process and all other > peers > > is established, you can store your required information in the GPR. > > > > > understand that mca_pml_base_modex_send and recv (or simply using > > > mca_btl_tcp_component_exchange) can be used to exchange > information, > > > > Use mca_pml_base_modex_send (now ompi_modex_send) and encode your > > required information. It's getting stored in the GPR. Read it back > > with > > mca_pml_base_modex_recv (ompi_modex_recv), as it is done in > > mca_btl_tcp_component_exchange and mca_btl_tcp_proc_create. > > > > > but I cannot seem to get them to communicate. So to put my > > question in > > > a very simple way..... I want to create a socket structure > > containing > > > necessary information, and then pass it to all other peers before > > > start of actual mpi communication. What is the easiest way to do > it. > > > > > > Quite the same way. mca_btl_tcp_component_exchange assembles the > > required information and stores it in the GPR by calling > > ompi_modex_send. > > > > mca_btl_tcp_proc_create (think of "the other peers") reads this > > information into local context. > > > > > > I guess you might want to copy btl/tcp to let's say btl/xen, so you > > can > > modify internal structures, if required. Perhaps xensockets don't > need > > IP addresses, as they are actually memory sockets. > > > > However, you'll still need TCP communication between Xen guests for > > the > > OOB channel. > > > > > > As mentioned above, I'm not sure if it's reasonable to use Xen and > MPI > > at all. Virtualization overhead might decrease your performance, and > > that's usually the last thing you want to have when using MPI ;) > > > > > > HTH > > > > -- > > Cluster and Metacomputing Working Group > > Friedrich-Schiller-Universität Jena, Germany > > > > private: http://adi.thur.de > > _______________________________________________ > > devel mailing list > > devel_at_[hidden] > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. > > Try it now._______________________________________________ > > devel mailing list > > devel_at_[hidden] > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > > _______________________________________________ > devel mailing list > devel_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Never miss a thing. Make Yahoo your homepage. > _______________________________________________ > devel mailing list > devel_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems _______________________________________________ devel mailing list devel_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/devel ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs