----- Original Message ----
From: Josh Hursey <jjhursey@open-mpi.org>
To: Open MPI Developers <devel@open-mpi.org>
Sent: Wednesday, March 19, 2008 2:20:59 AM
Subject: Re: [OMPI devel] xensocket btl and migration
Muhammad,
With
regard
to
your
question
on
migration
you
will
likely
have
to
reload
the
BTL
components
when
a
migration
occurs.
Open
MPI
currently
assumes
that
once
the
set
of
BTLs
are
decided
upon
in
a
process
they
are
to
be
used
until
the
application
completes.
There
is
some
limited
support
for
failover
in
which
if
one
BTL
'fails'
then
it
is
disregarded
and
a
previously
defined
alternative
path
is
used.
For
example
if
between
two
peers
Open
MPI
has
the
choice
of
using
tcp
or
openib
then
it
will
use
openib.
If
openib
were
to
fail
during
the
running
of
the
job
then
it
may
be
possible
for
Open
MPI
to
fail
over
and
use
just
tcp.
I'm
not
sure
how
well
tested
this
ability
is,
others
can
comment
if
you
are
interested
in
this.
However
failover
is
not
really
want
you
are
looking
for.
What
it
seem
you
are
looking
for
is
the
ability
to
tell
two
processes
that
they
should
no
longer
communicate
over
tcp,
but
continue
communication
over
xensockets
or
visa
versa.
One
technique
would
be
upon
migration,
if
unload
the
BTLs
(component_close)
then
reopen
(component_open)
and
reselect
(component_select)
then
reexchange
the
modex
the
processes
should
settle
into
the
new
configuration.
You
will
have
to
make
sure
that
any
state
Open
MPI
has
cached
such
as
network
addresses
and
node
name
data
is
refreshed
upon
restart.
Take
a
look
at
the
checkpoint/
restart
logic
for
how
I
do
this
in
the
code
base
([opal|orte|ompi]/
runtime/*_cr.c).
It
is
likely
that
there
is
another,
more
efficient
method
but
I
don't
have
anything
to
point
you
to
at
the
moment.
One
idea
would
be
to
add
a
refresh
function
to
the
modex
which
would
force
the
reexchange
of
a
single
processes
address
set.
There
are
a
slew
of
problems
with
this
that
you
will
have
to
overcome
including
race
conditions,
but
I
think
they
can
be
surmounted.
I'd
be
interested
in
hearing
your
experiences
implementing
this
in
Open
MPI.
Let
me
know
if
I
can
be
of
any
more
help.
Cheers,
Josh
On
Mar
9,
2008,
at
6:13
AM,
Muhammad
Atif
wrote:
>
Okay
guys..
with
all
your
support
and
help
in
understanding
ompi
>
architecture,
I
was
able
to
get
Xensocket
to
work.
Only
minor
>
changes
to
the
xensocket
kernel
module
made
it
compatible
with
>
libevent.
I
am
getting
results
which
are
bad
but
I
am
sure,
I
have
>
to
cleanup
the
code.
At
least
my
results
have
improved
over
native
>
netfront-netback
of
xen
for
messages
of
size
larger
than
1
MB.
>
>
I
started
with
making
minor
changes
in
the
TCP
btl,
but
it
seems
it
>
is
not
the
best
way,
as
changes
are
quite
huge
and
it
is
better
to
>
have
separate
dedicated
btl
for
xensockets.
As
you
guys
might
be
>
aware
Xen
supports
live
migration,
now
I
have
one
stupid
question.
>
My
knowledge
so
far
suggests
that
btl
component
is
initialized
only
>
once.
The
scerario
here
is
if
my
guest
os
is
migrated
from
one
>
physical
node
to
another,
and
realizes
that
the
communicating
>
processes
are
now
on
one
physical
host
and
they
should
abandon
use
>
of
TCP
btl
and
make
use
of
Xensocket
btl.
I
am
sure
it
would
not
>
happen
out
of
the
box,
but
is
it
possible
without
making
heavy
>
changes
in
the
openmpi
architecture?
>
With
the
current
design,
i
am
running
a
mix
of
tcp
and
xensocket
>
btls,
and
endpoints
check
periodically
if
they
are
on
same
physical
>
host
or
not.
This
has
quite
a
big
penalty
in
terms
of
time.
>
>
Another
question
is
(good
thing
i
am
using
email
otherwise
you
guys
>
would
beat
the
hell
outta
me,
its
such
a
basic
question).
I
am
not
>
able
to
track
MPI_Recv(...)
api
call
and
its
alike
calls.
Once
in
>
the
code
of
MPI_Recv(..)
we
give
a
call
to
rc
=
>
MCA_PML_CALL(recv(buf,
count
...
).
This
call
goes
to
the
macro,
and
>
pml.recv(..)
gets
invoked
(mca_pml_base_module_recv_fn_t
>
pml_recv;)
.
Where
can
I
find
the
actual
function?
I
get
totally
>
lost
when
trying
to
pinpoint
what
exactly
is
happening.
Basically,
I
>
am
looking
for
a
place
where
tcp
btl
recv
is
getting
called
with
all
>
the
goodies
and
parameters
which
were
passed
by
the
MPI
programmer.
>
I
hope
I
have
made
my
question
understandable.
>
>
Best
Regards,
>
Muhammad
Atif
>
>
>
-----
Original
Message
----
>
From:
Brian
W.
Barrett
<
brbarret@open-mpi.org>
>
To:
Open
MPI
Developers
<
devel@open-mpi.org>
>
Sent:
Wednesday,
February
6,
2008
2:57:31
AM
>
Subject:
Re:
[OMPI
devel]
xensocket
-
callbacks
through
OPAL/libevent
>
>
On
Mon,
4
Feb
2008,
Muhammad
Atif
wrote:
>
>
>
I
am
trying
to
port
xensockets
to
openmpi.
In
principle,
I
have
the
>
>
framework
and
everything,
but
there
seems
to
be
a
small
issue,
I
>
cannot
>
>
get
libevent
(or
OPAL)
to
give
callbacks
for
receive
(or
send)
for
>
>
xensockets.
I
have
tried
to
implement
native
code
for
xensockets
>
with
>
>
libevent
library,
again
the
same
issue.
No
call
backs!
.
With
>
normal
>
>
sockets,
callbacks
do
come
easily.
>
>
>
>
So
question
is,
do
the
socket/file
descriptors
have
to
have
some
>
special
>
>
mechanism
attached
to
them
to
support
callbacks
for
libevent/opal?
>
i.e
>
>
some
structure/magic?.
i.e.
maybe
the
developers
of
xensockets
did
>
not
>
>
add
that
callback/interrupt
thing
at
the
time
of
creation.
>
Xensockets
is
>
>
open
source,
but
my
knowledge
about
these
issues
is
limited.
So
I
>
though
>
>
some
pointer
in
right
direction
might
be
useful.
>
>
Yes
and
no
:).
As
you
discovered,
the
OPAL
interface
just
>
repackages
a
>
library
called
libevent
to
handle
its
socket
multiplexing.
Libevent
>
can
>
use
a
number
of
different
mechanisms
to
look
for
activity
on
sockets,
>
including
select()
and
poll()
calls.
On
Linux,
it
will
generally
use
>
poll().
poll()
requires
some
kernel
support
to
do
its
thing,
so
if
>
Xensockets
doesn't
implement
the
right
magic
to
trigger
poll()
events,
>
then
libevent
won't
work
for
Xensockets.
There's
really
nothing
you
>
can
>
do
from
the
Open
MPI
front
to
work
around
this
issue
--
it
would
>
have
to
>
be
fixed
as
part
of
Xensockets.
>
>
>
Second
question
is,
what
if
we
cannot
have
the
callbacks.
What
is
>
the
>
>
recommended
way
to
implement
the
btl
component
for
such
a
device?
>
Do
we
>
>
need
to
do
this
with
event
timers?
>
>
Have
a
look
at
any
of
the
BTLs
that
isn't
TCP
--
none
of
them
use
>
libevent
>
callbacks
for
progress.
Instead,
they
provide
a
progress
function
>
as
part
>
of
the
BTL
interface,
which
is
called
on
a
regular
basis
whenever
>
progress
>
needs
to
be
made.
>
>
Brian
>
_______________________________________________
>
devel
mailing
list
>
devel@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>
>
Be
a
better
friend,
newshound,
and
know-it-all
with
Yahoo!
Mobile.
>
Try
it
now._______________________________________________
>
devel
mailing
list
>
devel@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________
devel
mailing
list
devel@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel