Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [Fwd: multi-threaded test]
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-03-14 21:58:19


On Mar 12, 2011, at 03:51 , N.M. Maclaren wrote:

> On Mar 12 2011, George Bosilca wrote:
>
>> Removing thread support is _NOT_ an option (https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/MPI3Hybrid).
>>
>> Unlike the usual claims on this mailing list, MPI_THREAD_MULTIPLE had been fully supported for several BTLs in Open MPI (http://www.springerlink.com/content/lmh1144p51317313/). The long term goal is to go back to at least the same level of support, and not to totally annihilate the efforts put into this in the past.
>
> You have clearly misunderstood what I was posting, and I am not
> sure that you understand the problem I am describing. The problem is
> NOT whether OpenMPI can claim to support it, or even make it work
> most of the time - that's almost trivial. I will attempt to clarify,
> and then will not continue unless there is something new.
> The problems have NOTHING WHATSOEVER to do with the transfer library
> layer, which is which I said that threads used behind the scene are
> not a problem.

Nobody challenged your statements about threading or about the correctness of the POSIX standard. However, such concerns are better voiced on forums related to that specific subject, where they have a chance to be taken into account by people who understand them.

This particular topic was about MPI level threading support, more specifically about the threading support Open MPI would like to provide. In this limited context, people interested in using thread and MPI together are well aware about the limitations imposed on their application, as well as the pitfalls they have to avoid. Moreover, with the new endpoint proposal (the one I was pointing on my previous email), the threads will become first-class citizen in MPI.

 george.

> The killer is that the languages and system specifications do not make
> it possible to implement such things reliably, let alone portably to
> almost all conforming systems. And the issues do NOT normally arise in
> what the OpenMPI code does, but in what the USER code does that
> interacts with what the OpenMPI code does or does not do.
>
>
> Take that damn signal handling fiasco, and assume that threat T in
> process P does something that triggers an asynchronous signal. To my
> certain knowledge, that may be delivered to T, another thread T1 in P,
> all threads in P, P itself, or a group of processes including P, and
> there are essentially no facilities to control this or even to find out
> which has happened. When one thread 'handles' that signal, it may clear
> the signal from all or some of the other threads and processes that have
> it pending - but there are NO facilities to enforce synchronisation, and
> the normal memory synchronisation primitives don't do it!
>
> So you have an INSOLUBLE race condition, which will have the usual effect
> of showing up as a very low probability, non-repeatable misbehaviour.
>
> Another one I have seen, that is equally unspecified and unreliable, is
> kernel scheduling. There is no way for one thread to say 'run thread T1
> next' - all it can do is to fiddle priorities, and no system that I know
> of implements those in the way the specification indicates. I have seen
> a thread T waiting on an event to be caused by thread T1, but have had
> no way to get T1 to actually run, for any one of several complicated
> reasons. This can happen to processes, too, but there are at least
> SOME tools to get out of the hole!
>
> And then there are the old, old issues with file descriptor ownership.
> Under MVS, you could read and write a file from any task (thread), but
> only extend the file or close it from the thread that opened it, and
> occasionally writing needed extension. Oops. Well, I have seen that
> one on Unix sockets, too. But it probably isn't extant, until you start
> considering programs that use setuid/setgid/setsid/etc. - yes, they
> affect all threads, in theory, but how are they synchronised?
>
> And so it goes. There are DOZENS of other gotchas, many of which I have
> seen arise on real systems. And, no, they are NOT bugs, because the
> standards don't say what should happen.
>
>
> This area is a complete mess, which is why all experienced software
> engineers batten down the hatches, switch on maximum paranoia mode, and
> use the most cautious approach that they can get away with. And, even
> then, they don't trust anything and insert lots of internal checking
> to try to detect when something unexpected has happened, and their
> environment has gone pear-shaped.
>
>
> Regards,
> Nick Maclaren.
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

"I disapprove of what you say, but I will defend to the death your right to say it"
  -- Evelyn Beatrice Hall