Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Donald Kerr (Don.Kerr_at_[hidden])
Date: 2006-08-10 17:18:12


Hey Andrew I have one for you...

I get the following error message on a node that does not have any IB cards
--------------------------------------------------------------------------
[0,1,0]: uDAPL on host burl-ct-v40z-0 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

but I don't see this for the openib btl. Why udapl and not openib? Am I
missing something?

-DON

Andrew Friedley wrote On 08/10/06 17:06,:

>Hopefully some of the other developers will correct me if I am wrong..
>
>Brock Palen wrote:
>
>
>>I had a user ask this, its not a very practical question but I am
>>curious.
>>
>>
>
>This is good information for the archives :)
>
>
>
>>OMPI uses a 'fast' network if its available. (IB, GM, etc) I also
>>infer that for process in the same SMP nodes the sm (shared memory)
>>btl is used, even if the job has more than one node given to it? The
>>real question is what happens if a job is given three nodes, two have
>>IB adapters and all have ethernet. Will the entire job use TCP for
>>process on different nodes and shared memory inner node? Or will the
>>two that have ib connections use ib to communicate and only use TCP
>>when talking to the third host that does not have IB.
>>
>>
>
>You infer correctly - sm is just considered to be another network we
>support.
>
>The two nodes with IB will use IB to communicate with each other, and
>ethernet (TCP) to communicate with the third node that lacks IB. This
>works the same for shared memory - MPI processes on the same node will
>use SM to communicate, and use say IB or TCP to communicate off-node.
>
>
>
>>Second would it be safe to say OMPI searches the BTL's in the
>>following order when trying to reach a process?
>>Self
>>SM
>>IB, GM, MX, MVAPI
>>TCP
>>
>>
>
>Actually, each BTL has an exclusivity value that we use to choose which
>BTL is given preference when we have several BTLs available for
>communication. A quick grep shows you're pretty much right on:
>
>$ ompi_info --all|grep exclusivity
> MCA btl: parameter "btl_openib_exclusivity" (current value: "1024")
> MCA btl: parameter "btl_self_exclusivity" (current value: "65536")
> MCA btl: parameter "btl_sm_exclusivity" (current value: "65535")
> MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")
>
>These of course can be tuned, though expect trouble if you give
>something higher exclusivity than self. These numbers have no real
>meaning other than their relation to one another. For example, changing
>openib's exclusivity to 65000 won't change when/how it is used among the
>BTLs I have above, though it would possibly change relative to
>GM/MX/MVAPI if they're present.
>
>
>
>>Third, what about a hypothetical case when a node has both GM and IB
>>on it? (evaluation machines)
>>
>>
>
>(This is where I might be wrong) The network with the highest
>exclusivity is used for sending of eager messages and the initial part
>of large messages using rendezvous protocol. Beyond that, large message
>data is striped across all available BTLs for more bandwidth.
>
>You probably know already that the 'btl' MCA parameter can be used to
>select a set of BTLs at runtime, ie to just use IB (and self).
>
>
>
>>Last does OMPI build something like a route list when mpi_init() is
>>called? This way knowing how to get to other members off the job?
>>Or is this done the first time a process needs to talk to another
>>process thus not storing any route information not needed.
>>
>>
>
>Yes - at initialization time (and when processes are dynamically added),
>each BTL is responsible for determining which other processes it can
>communicate with. This information is pushed back up to the higher
>levels (BML/PML) for use in scheduling decisions.
>
>However, those BTLs that communicate over point-to-point connection
>pairs do not establish connections until data needs to be sent (lazy
>connection establishment). This way we do not immediately set up N^2
>connections, but instead only as each pairwise communication path is used.
>
>The route information consumes relatively few resources compared to all
>the buffers and handles that must be allocated for connections in most
>of the BTLs.
>
>
>
>>p.s. not having to recompile code for different networks has made
>>evaluating network so much more enjoyable. Thank-you for all the
>>work on the selection of networks 'just working'
>>
>>
>
>That was our goal, stuff should just work. Glad you appreciate it as
>much as we do.
>
>Andrew
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>