Hello,
This is basically a repost of my previous mail regarding problems
with connect/accept and disconnect (*this is not related to
spawning, parent/child*).
I *sometimes* find processes blocking indefinitely at
Connect/Accept calls or at Disconnect calls. I have an example
below.
Process A
{
MPI_Open_port(...);
MPI_Publish_name(...);
MPI_Comm_accept(... &b_comm); // -----> (1)
// Do something1
MPI_Comm_disconnect(&b_comm); // ------> (2)
// Do something2
}
Process B
{
MPI_Lookup_name(...);
MPI_Comm_connect(... &a_comm); // -----> (1)
// Do something1
MPI_Comm_disconnect(&a_comm); // ------> (2)
// Do something2
}
In the above scenario, in a perfect case where A reaches (1)
without any problems, *sometimes* B blocks at its (1)
indefinitely. All arguments passed to both the functions are
perfect.
Again, *sometimes* one of them block infinitely at (2)
while the other goes on to do the something2. This could only be a
problem at the application level only if the one that blocks
indefinitely is always the same but it is not so. Sometimes A
blocks and B is busy doing something2 or A is busy doing its
something2 while B blocks.
Is this a known issue? or am I the only person experiencing this
and is clean for others who frequently use
connect/accept/disconnect calls?
Thanks,
Suraj
_______________________________________________