On Mar 13, 2009, at 6:47 AM, Ricardo Fernández-Perea wrote:
> In the same machine the same job takes a lot more time while using
> XGrid than while using any other method even all the orted run in the
> same node when using Xgrid it use tcp instead of sm is that expected
> or do I have a problem.
This is unfortunately a known issue. Because XGrid doesn't give any
way of knowing where to launch until the processes are already
started, and doesn't handle wire-up, I had to fake a couple of things
when I initially wrote the code. In particular, our run-time really
wanted to know if two processes were on the same node *before* the
launch (so that it would know if they could share a control daemon).
That part is still a problem, although possibly solvable with changes
in the run-time since I wrote that code.
If the world was perfect, I'd launch only the executables and skip the
daemons. The problem with that model is that xgrid's stdio forwarding
is a little different than what most users expect. It is (or was)
nearly impossible to get "real time" stdio output from the processes
without handling it all ourselves, which requires the previously
mentioned, slightly evil, daemons.
All this leads up to the short answer to your question - it's expected
that two processes on the same node with xgrid will use tcp instead of
shared memory for communication. This could probably be fixed with
some extra coding, but unfortunately I'm totally swamped on another
project (and trying to finish my thesis), so it's unlikely I'll be
able to look at it for a while.
Open MPI developer