Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] some questions regarding the portals modules
From: Jerome Soumagne (soumagne_at_[hidden])
Date: 2010-07-09 17:23:04

Hi Ken,

I thank you a lot for your reply, I will think about it and do some more
tests. I was only thinking about using MPI threads, but yes as you say
if two threads are scheduled on the same core, that wouldn't be pretty
at all. I can probably do some more tests of that functionality, but I
don't expect to have great results.

I'm not sure to correctly understand what you say about the spawn. I
found a presentation on the web from Richard Graham saying that the
spawn functionality was implemented as well as it says in this
presentation that you get a full MPI 2 support on the Cray XT. When I
said that I had problems with the MPI_Comm_accept/connect functions, I
meant that I actually get errors when I try to do a "simple"
MPI_Open_port, do you know where I can find in the code whether this
function is implemented or not? If it is implemented, knowing where it
is defined would help me to find the origin of my problem and possibly
extend the support of this functionality (if it is feasible). I would
like to be able to link two different jobs together using these
functions, ie. creating a communicator between the jobs.



On 07/09/2010 07:16 PM, Matney Sr, Kenneth D. wrote:
> Hello Jerome,
> The first one is simple. portals is not thead-safe on the Cray XT. As, I recall,
> only the master thread can post an event. although any thread can receive
> the event. Although, i might have it backwards. It has been a couple of years
> since I played with this.
> The second one depends on how you use your Cray XT. In our case, the machine
> is used as process-per-core; i.e., not as a collection of SMPs. For performance
> reasons, you definitely do not want MPI threads. Also, since it is run process-per-core,
> there is nothing to be gained with progress threads. Portals events will generate a kernel
> level interrupt. Whether you can run the XT as a cluster of SMPs is another question
> entirely. We really have not tried this in the context of OMPI. But, in conjunction with
> portals, this might open a "can of worms". For example, any thread can be run on any
> core. But the portals ID for a thread will be the NID/PID pair for that core. If two threads
> get scheduled to the same core, it would not be pretty.
> I could see lots of reasons why spawn might fail. First, it is run on a compute node.
> There is no way for a compute node to run a process on another compute node.
> Also, there will be no rank/size initialization forthcoming from ALPS. So, even if
> it got past this, it would be running on the same node as its parent.
> -- Ken Matney, Sr.
> Oak Ridge National Laboratory
> On Jul 9, 2010, at 7:53 AM, Jerome Soumagne wrote:
> Hi,
> As I said in the previous e-mail, we've recently installed OpenMPI on a Cray XT5 machine, and we therefore use the portals and the alps libraries. Thanks for providing the configuration script from Jaguar, this was very helpful, it had just to be slightly adapted in order to use the latest CNL version installed on this machine.
> I have some questions though regarding the use of the portals btl and mtl components. I noticed that when I compiled OpenMPI with mpi-thread support enabled and ran a job, the portals components did not want to initialize due to these funny lines:
> ./mtl_portals_component.c
> 182 /* we don't run with no stinkin' threads */
> 183 if (enable_progress_threads || enable_mpi_threads) return NULL;
> I'd like to know why are mpi threads disabled since threads are supported on the XT5, does the btl/mtl require to have thread-safety implemented or something like that or is it because of the portals library itself ?
> I would also like to use the MPI_Comm_accept/connect functions, it seems that it's not possible to do that using the portals mtl even if the spawn seems to be supported, did I do something wrong or is it really not supported?
> In this case, is it possible to extend this module to support these functions? We could help in doing that.
> I'd like also to know, are there any plans for creating a module in order to use the DMAPP interface for the Gemini interconnect?
> Thanks.
> Jerome
> --
> Jérôme Soumagne
> Scientific Computing Research Group
> CSCS, Swiss National Supercomputing Centre
> Galleria 2, Via Cantonale | Tel: +41 (0)91 610 8258
> CH-6928 Manno, Switzerland | Fax: +41 (0)91 610 8282
> <ATT00001..txt>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]