Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307
From: Tim Prins (tprins_at_[hidden])
Date: 2008-02-01 11:40:20


Adrian,

For the most part this seems to work for me. But there are a few issues.
I'm not sure which are introduced by this patch, and whether some may be
expected behavior. But for completeness I will point them all out.
First, let me explain I am working on a machine with 3 tcp interfaces,
lo, eth0, and ib0. Both eth0 and ib0 connect all the compute nodes.

1. There are some warnings when compiling:
btl_tcp_proc.c:171: warning: no previous prototype for 'evaluate_assignment'
btl_tcp_proc.c:206: warning: no previous prototype for 'visit'
btl_tcp_proc.c:224: warning: no previous prototype for
'mca_btl_tcp_initialise_interface'
btl_tcp_proc.c: In function `mca_btl_tcp_proc_insert':
btl_tcp_proc.c:304: warning: pointer targets in passing arg 2 of
`opal_ifindextomask' differ in signedness
btl_tcp_proc.c:313: warning: pointer targets in passing arg 2 of
`opal_ifindextomask' differ in signedness
btl_tcp_proc.c:389: warning: comparison between signed and unsigned
btl_tcp_proc.c:400: warning: comparison between signed and unsigned
btl_tcp_proc.c:401: warning: comparison between signed and unsigned
btl_tcp_proc.c:459: warning: ISO C90 forbids variable-size array `a'
btl_tcp_proc.c:459: warning: ISO C90 forbids mixed declarations and code
btl_tcp_proc.c:465: warning: ISO C90 forbids mixed declarations and code
btl_tcp_proc.c:466: warning: comparison between signed and unsigned
btl_tcp_proc.c:480: warning: comparison between signed and unsigned
btl_tcp_proc.c:485: warning: comparison between signed and unsigned
btl_tcp_proc.c:495: warning: comparison between signed and unsigned

2. If I exclude all my tcp interfaces, the connection fails properly,
but I do get a malloc request for 0 bytes:
tprins_at_odin examples]$ mpirun -mca btl tcp,self -mca btl_tcp_if_exclude
eth0,ib0,lo -np 2 ./ring_c
malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
malloc debug: Request for 0 bytes (btl_tcp_component.c, 844)
<snip>

3. If the exclude list does not contain 'lo', or the include list
contains 'lo', the job hangs when using multiple nodes:
[tprins_at_odin examples]$ mpirun -mca btl tcp,self -mca
btl_tcp_if_exclude ib0 -np 2 -bynode ./ring_cProcess 0 sending 10 to 1,
tag 201 (2 processes in ring)
[odin011][1,0][btl_tcp_endpoint.c:619:mca_btl_tcp_endpoint_complete_connect]
connect() failed: Connection refused (111)
<hang>
[tprins_at_odin examples]$ mpirun -mca btl tcp,self -mca
btl_tcp_if_include eth0,lo -np 2 -bynode ./ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
[odin011][1,0][btl_tcp_endpoint.c:619:mca_btl_tcp_endpoint_complete_connect]
connect() failed: Connection refused (111)
<hang>

However, the great news about this patch is that it appears to fix
https://svn.open-mpi.org/trac/ompi/ticket/1027 for me.

Hope this helps,

Tim

Adrian Knoth wrote:
> On Wed, Jan 30, 2008 at 06:48:54PM +0100, Adrian Knoth wrote:
>
>>> What is the real issue behind this whole discussion?
>> Hanging connections.
>> I'll have a look at it tomorrow.
>
> To everybody who's interested in BTL-TCP, especially George and (to a
> minor degree) rhc:
>
> I've integrated something what I call "magic address selection code".
> See the comments in r17348.
>
> Can you check
>
> https://svn.open-mpi.org/svn/ompi/tmp-public/btl-tcp
>
> if it's working for you? Read: multi-rail TCP, FNN, whatever is
> important to you?
>
>
> The code is proof of concept and could use a little tuning (if it's
> working at all. Over here, it satisfies all tests).
>
> I vaguely remember that at least Ralph doesn't like
>
> int a[perm_size * sizeof(int)];
>
> where perm_size is dynamically evaluated (read: array size is runtime
> dependent)
>
> There are also some large arrays, search for MAX_KERNEL_INTERFACE_INDEX.
> Perhaps it's better to replace them with an appropriate OMPI data
> structure. I don't know what fits best, you guys know the details...
>
>
> So please give the code a try, and if it's working, feel free to cleanup
> whatever is necessary to make it the OMPI style or give me some pointers
> what to change.
>
>
> I'd like to point to Thomas' diploma thesis. The PDF explains the theory
> behind the code, it's like an rationale. Unfortunately, the PDF has some
> typos, but I guess you'll get the idea. It's a graph matching algorithm,
> Chapter 3 covers everything in detail:
>
> http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf
>
>
> HTH
>