Thanks a lot for the reply. You have understood my problem correctly but I am unable to comprehend your solution or suggestion where to look into . The btl_size is shown as 12 and size as 8. But my understanding of mca_btl_tcp_comonent_exchange function is a touch different or perhaps wrong, so please correct me if I am wrong.

Once we do the exchange i.e. mca_btl_tcp_component_exchange:(), the size is calculated as

size_t size = mca_btl_tcp_component.tcp_num_btls * sizeof(mca_btl_tcp_addr_t);

This is giving me correct size. I have only one tcp_num_btls, therefore size is given as 12. Now we allocate memory by

mca_btl_tcp_addr_t *addrs = (mca_btl_tcp_addr_t *)malloc(size)

As size is 12, hence it gives me the correct allocation. And lastly

rc =  mca_pml_base_modex_send(&mca_btl_tcp_component.super.btl_version, addrs, size);

This sends addrs with the size 12. Should not that work out of the box? Or are there more things attached which are not transparent?

Can you please give me some more explanation of this statement.... which I think holds the key to my solution, but I am not able to comprehend correctly.
"We copy the information to be sent into the addrs array and increase xfer_size afterwards (telling the function how many bytes to be transferred)."
Where exactly are we increasing the size?

Best Regards,
Muhammad Atif

----- Original Message ----
From: Adrian Knoth <adi@drcomp.erfurt.thur.de>
To: Open MPI Developers <devel@open-mpi.org>
Sent: Thursday, January 17, 2008 11:43:24 PM
Subject: Re: [OMPI devel] btl tcp port to xensocket

On Tue, Jan 15, 2008 at 04:07:02PM -0800, Muhammad Atif wrote:

> Just for reference, I am trying to port btl/tcp to xensockets. Now if
> i want to do modex send/recv , to my understanding, mca_btl_tcp_addr_t
> is used (ref code/function is mca_btl_tcp_component_exchange). For
> xensockets, I need to send only one additional integer remote_domU_id
> across to say all the peers (in refined code it would be specific to
> each domain, but i just want to have clear understanding before i move
> any further). Now I have changed the struct mca_btl_tcp_addr_t present
> in btl_tcp_addr.h and have added int r_domu_id. This makes the size of
> structure 12. Upon receive mca_btl_tcp_proc_create() gives an error
> after mca_pml_base_modex_recv() and at this statement if(0 != (size %
> sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still
> expecting size 8, where as i have made the size 12.  I am unable to
> pin point the exact location where the size 8 is still embedded. Any
> ideas?

Just an idea: the mca_base_modex_recv error gives you this error:

          BTL_ERROR(("mca_base_modex_recv: invalid size %d: btl-size:
      %d\n", size, sizeof(mca_btl_tcp_addr_t)));

So what is wrong? Is btl-size shown as 12 or as 8? It should be 12. And
is size just 8? So this means you forgot to include your new socket in
your modex_send_request.

See mca_btl_tcp_component_exchange: We copy the information to be sent
into the addrs array and increase xfer_size afterwards (telling the
function how many bytes to be transferred).

Perhaps you missed something there.

Cluster and Metacomputing Working Group
Friedrich-Schiller-Universitšt Jena, Germany

private: http://adi.thur.de
devel mailing list

Never miss a thing. Make Yahoo your homepage.