Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-04-24 14:03:35


Actually, OMPI is distributed with a daemon that does pretty much what you want. Checkout "man ompi-server". I originally wrote that code to support cross-application MPI publish/subscribe operations, but we can utilize it here too. Have to blame me for not making it more publicly known.

The attached patch upgrades ompi-server and modifies the singleton startup to provide your desired support. This solution works in the following manner:

1. launch "ompi-server -report-uri <filename>". This starts a persistent daemon called "ompi-server" that acts as a rendezvous point for independently started applications. The problem with starting different applications and wanting them to MPI connect/accept lies in the need to have the applications find each other. If they can't discover contact info for the other app, then they can't wire up their interconnects. The "ompi-server" tool provides that rendezvous point. I don't like that comm_accept segfaulted - should have just error'd out.

2. set OMPI_MCA_orte_server=file:<filename>" in the environment where you will start your processes. This will allow your singleton processes to find the ompi-server. I automatically also set the envar to connect the MPI publish/subscribe system for you.

3. run your processes. As they think they are singletons, they will detect the presence of the above envar and automatically connect themselves to the "ompi-server" daemon. This provides each process with the ability to perform any MPI-2 operation.

I tested this on my machines and it worked, so hopefully it will meet your needs. You only need to run one "ompi-server" period, so long as you locate it where all of the processes can find the contact file and can open a TCP socket to the daemon. There is a way to knit multiple ompi-servers into a broader network (e.g., to connect processes that cannot directly access a server due to network segmentation), but it's a tad tricky - let me know if you require it and I'll try to help.

If you have trouble wiring them all into a single communicator, you might ask separately about that and see if one of our MPI experts can provide advice (I'm just the RTE grunt).

HTH - let me know how this works for you and I'll incorporate it into future OMPI releases.
Ralph

On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:

> Hi Ralph,
> I'm Krzysztof and I'm working with Grzegorz Maj on this our small project/experiment.
>
> We definitely would like to give your patch a try. But could you please explain your solution a little more?
> You still would like to start one mpirun per mpi grid, and then have processes started by us to join the MPI comm?
> It is a good solution of course.
> But it would be especially preferable to have one daemon running persistently on our "entry" machine that can handle several mpi grid starts. Can your patch help us this way too?
>
> Thanks for your help!
> Krzysztof
>
> On 24 April 2010 03:51, Ralph Castain <rhc_at_[hidden]> wrote:
> In thinking about this, my proposed solution won't entirely fix the problem - you'll still wind up with all those daemons. I believe I can resolve that one as well, but it would require a patch.
>
> Would you like me to send you something you could try? Might take a couple of iterations to get it right...
>
> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>
> > Hmmm....I -think- this will work, but I cannot guarantee it:
> >
> > 1. launch one process (can just be a spinner) using mpirun that includes the following option:
> >
> > mpirun -report-uri file
> >
> > where file is some filename that mpirun can create and insert its contact info into it. This can be a relative or absolute path. This process must remain alive throughout your application - doesn't matter what it does. It's purpose is solely to keep mpirun alive.
> >
> > 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where "file" is the filename given above. This will tell your processes how to find mpirun, which is acting as a meeting place to handle the connect/accept operations
> >
> > Now run your processes, and have them connect/accept to each other.
> >
> > The reason I cannot guarantee this will work is that these processes will all have the same rank && name since they all start as singletons. Hence, connect/accept is likely to fail.
> >
> > But it -might- work, so you might want to give it a try.
> >
> > On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
> >
> >> To be more precise: by 'server process' I mean some process that I
> >> could run once on my system and it could help in creating those
> >> groups.
> >> My typical scenario is:
> >> 1. run N separate processes, each without mpirun
> >> 2. connect them into MPI group
> >> 3. do some job
> >> 4. exit all N processes
> >> 5. goto 1
> >>
> >> 2010/4/23 Grzegorz Maj <maju3_at_[hidden]>:
> >>> Thank you Ralph for your explanation.
> >>> And, apart from that descriptors' issue, is there any other way to
> >>> solve my problem, i.e. to run separately a number of processes,
> >>> without mpirun and then to collect them into an MPI intracomm group?
> >>> If I for example would need to run some 'server process' (even using
> >>> mpirun) for this task, that's OK. Any ideas?
> >>>
> >>> Thanks,
> >>> Grzegorz Maj
> >>>
> >>>
> >>> 2010/4/18 Ralph Castain <rhc_at_[hidden]>:
> >>>> Okay, but here is the problem. If you don't use mpirun, and are not operating in an environment we support for "direct" launch (i.e., starting processes outside of mpirun), then every one of those processes thinks it is a singleton - yes?
> >>>>
> >>>> What you may not realize is that each singleton immediately fork/exec's an orted daemon that is configured to behave just like mpirun. This is required in order to support MPI-2 operations such as MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
> >>>>
> >>>> So if you launch 64 processes that think they are singletons, then you have 64 copies of orted running as well. This eats up a lot of file descriptors, which is probably why you are hitting this 65 process limit - your system is probably running out of file descriptors. You might check you system limits and see if you can get them revised upward.
> >>>>
> >>>>
> >>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
> >>>>
> >>>>> Yes, I know. The problem is that I need to use some special way for
> >>>>> running my processes provided by the environment in which I'm working
> >>>>> and unfortunately I can't use mpirun.
> >>>>>
> >>>>> 2010/4/18 Ralph Castain <rhc_at_[hidden]>:
> >>>>>> Guess I don't understand why you can't use mpirun - all it does is start things, provide a means to forward io, etc. It mainly sits there quietly without using any cpu unless required to support the job.
> >>>>>>
> >>>>>> Sounds like it would solve your problem. Otherwise, I know of no way to get all these processes into comm_world.
> >>>>>>
> >>>>>>
> >>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>> I'd like to dynamically create a group of processes communicating via
> >>>>>>> MPI. Those processes need to be run without mpirun and create
> >>>>>>> intracommunicator after the startup. Any ideas how to do this
> >>>>>>> efficiently?
> >>>>>>> I came up with a solution in which the processes are connecting one by
> >>>>>>> one using MPI_Comm_connect, but unfortunately all the processes that
> >>>>>>> are already in the group need to call MPI_Comm_accept. This means that
> >>>>>>> when the n-th process wants to connect I need to collect all the n-1
> >>>>>>> processes on the MPI_Comm_accept call. After I run about 40 processes
> >>>>>>> every subsequent call takes more and more time, which I'd like to
> >>>>>>> avoid.
> >>>>>>> Another problem in this solution is that when I try to connect 66-th
> >>>>>>> process the root of the existing group segfaults on MPI_Comm_accept.
> >>>>>>> Maybe it's my bug, but it's weird as everything works fine for at most
> >>>>>>> 65 processes. Is there any limitation I don't know about?
> >>>>>>> My last question is about MPI_COMM_WORLD. When I run my processes
> >>>>>>> without mpirun their MPI_COMM_WORLD is the same as MPI_COMM_SELF. Is
> >>>>>>> there any way to change MPI_COMM_WORLD and set it to the
> >>>>>>> intracommunicator that I've created?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Grzegorz Maj
> >>>>>>> _______________________________________________
> >>>>>>> users mailing list
> >>>>>>> users_at_[hidden]
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> users_at_[hidden]
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users