Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept
From: Grzegorz Maj (maju3_at_[hidden])
Date: 2010-07-07 12:17:30

2010/7/7 Ralph Castain <rhc_at_[hidden]>:
> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote:
>> Hi Ralph,
>> sorry for the late response, but I couldn't find free time to play
>> with this. Finally I've applied the patch you prepared. I've launched
>> my processes in the way you've described and I think it's working as
>> you expected. None of my processes runs the orted daemon and they can
>> perform MPI operations. Unfortunately I'm still hitting the 65
>> processes issue :(
>> Maybe I'm doing something wrong.
>> I attach my source code. If anybody could have a look on this, I would
>> be grateful.
>> When I run that code with clients_count <= 65 everything works fine:
>> all the processes create a common grid, exchange some information and
>> disconnect.
>> When I set clients_count > 65 the 66th process crashes on
>> MPI_Comm_connect (segmentation fault).
> I didn't have time to check the code, but my guess is that you are still hitting some kind of file descriptor or other limit. Check to see what your limits are - usually "ulimit" will tell you.

My limitations are:
time(seconds) unlimited
file(blocks) unlimited
data(kb) unlimited
stack(kb) 10240
coredump(blocks) 0
memory(kb) unlimited
locked memory(kb) 64
process 200704
nofiles 1024
vmemory(kb) unlimited
locks unlimited

Which one do you think could be responsible for that?

I was trying to run all the 66 processes on one machine or spread them
across several machines and it always crashes the same way on the 66th

>> Another thing I would like to know is if it's normal that any of my
>> processes when calling MPI_Comm_connect or MPI_Comm_accept when the
>> other side is not ready, is eating up a full CPU available.
> Yes - the waiting process is polling in a tight loop waiting for the connection to be made.
>> Any help would be appreciated,
>> Grzegorz Maj
>> 2010/4/24 Ralph Castain <rhc_at_[hidden]>:
>>> Actually, OMPI is distributed with a daemon that does pretty much what you
>>> want. Checkout "man ompi-server". I originally wrote that code to support
>>> cross-application MPI publish/subscribe operations, but we can utilize it
>>> here too. Have to blame me for not making it more publicly known.
>>> The attached patch upgrades ompi-server and modifies the singleton startup
>>> to provide your desired support. This solution works in the following
>>> manner:
>>> 1. launch "ompi-server -report-uri <filename>". This starts a persistent
>>> daemon called "ompi-server" that acts as a rendezvous point for
>>> independently started applications.  The problem with starting different
>>> applications and wanting them to MPI connect/accept lies in the need to have
>>> the applications find each other. If they can't discover contact info for
>>> the other app, then they can't wire up their interconnects. The
>>> "ompi-server" tool provides that rendezvous point. I don't like that
>>> comm_accept segfaulted - should have just error'd out.
>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment where you
>>> will start your processes. This will allow your singleton processes to find
>>> the ompi-server. I automatically also set the envar to connect the MPI
>>> publish/subscribe system for you.
>>> 3. run your processes. As they think they are singletons, they will detect
>>> the presence of the above envar and automatically connect themselves to the
>>> "ompi-server" daemon. This provides each process with the ability to perform
>>> any MPI-2 operation.
>>> I tested this on my machines and it worked, so hopefully it will meet your
>>> needs. You only need to run one "ompi-server" period, so long as you locate
>>> it where all of the processes can find the contact file and can open a TCP
>>> socket to the daemon. There is a way to knit multiple ompi-servers into a
>>> broader network (e.g., to connect processes that cannot directly access a
>>> server due to network segmentation), but it's a tad tricky - let me know if
>>> you require it and I'll try to help.
>>> If you have trouble wiring them all into a single communicator, you might
>>> ask separately about that and see if one of our MPI experts can provide
>>> advice (I'm just the RTE grunt).
>>> HTH - let me know how this works for you and I'll incorporate it into future
>>> OMPI releases.
>>> Ralph
>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:
>>> Hi Ralph,
>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small
>>> project/experiment.
>>> We definitely would like to give your patch a try. But could you please
>>> explain your solution a little more?
>>> You still would like to start one mpirun per mpi grid, and then have
>>> processes started by us to join the MPI comm?
>>> It is a good solution of course.
>>> But it would be especially preferable to have one daemon running
>>> persistently on our "entry" machine that can handle several mpi grid starts.
>>> Can your patch help us this way too?
>>> Thanks for your help!
>>> Krzysztof
>>> On 24 April 2010 03:51, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> In thinking about this, my proposed solution won't entirely fix the
>>>> problem - you'll still wind up with all those daemons. I believe I can
>>>> resolve that one as well, but it would require a patch.
>>>> Would you like me to send you something you could try? Might take a couple
>>>> of iterations to get it right...
>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>>>>> Hmmm....I -think- this will work, but I cannot guarantee it:
>>>>> 1. launch one process (can just be a spinner) using mpirun that includes
>>>>> the following option:
>>>>> mpirun -report-uri file
>>>>> where file is some filename that mpirun can create and insert its
>>>>> contact info into it. This can be a relative or absolute path. This process
>>>>> must remain alive throughout your application - doesn't matter what it does.
>>>>> It's purpose is solely to keep mpirun alive.
>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where
>>>>> "file" is the filename given above. This will tell your processes how to
>>>>> find mpirun, which is acting as a meeting place to handle the connect/accept
>>>>> operations
>>>>> Now run your processes, and have them connect/accept to each other.
>>>>> The reason I cannot guarantee this will work is that these processes
>>>>> will all have the same rank && name since they all start as singletons.
>>>>> Hence, connect/accept is likely to fail.
>>>>> But it -might- work, so you might want to give it a try.
>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
>>>>>> To be more precise: by 'server process' I mean some process that I
>>>>>> could run once on my system and it could help in creating those
>>>>>> groups.
>>>>>> My typical scenario is:
>>>>>> 1. run N separate processes, each without mpirun
>>>>>> 2. connect them into MPI group
>>>>>> 3. do some job
>>>>>> 4. exit all N processes
>>>>>> 5. goto 1
>>>>>> 2010/4/23 Grzegorz Maj <maju3_at_[hidden]>:
>>>>>>> Thank you Ralph for your explanation.
>>>>>>> And, apart from that descriptors' issue, is there any other way to
>>>>>>> solve my problem, i.e. to run separately a number of processes,
>>>>>>> without mpirun and then to collect them into an MPI intracomm group?
>>>>>>> If I for example would need to run some 'server process' (even using
>>>>>>> mpirun) for this task, that's OK. Any ideas?
>>>>>>> Thanks,
>>>>>>> Grzegorz Maj
>>>>>>> 2010/4/18 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and are not
>>>>>>>> operating in an environment we support for "direct" launch (i.e., starting
>>>>>>>> processes outside of mpirun), then every one of those processes thinks it is
>>>>>>>> a singleton - yes?
>>>>>>>> What you may not realize is that each singleton immediately
>>>>>>>> fork/exec's an orted daemon that is configured to behave just like mpirun.
>>>>>>>> This is required in order to support MPI-2 operations such as
>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
>>>>>>>> So if you launch 64 processes that think they are singletons, then
>>>>>>>> you have 64 copies of orted running as well. This eats up a lot of file
>>>>>>>> descriptors, which is probably why you are hitting this 65 process limit -
>>>>>>>> your system is probably running out of file descriptors. You might check you
>>>>>>>> system limits and see if you can get them revised upward.
>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>>>>>>>> Yes, I know. The problem is that I need to use some special way for
>>>>>>>>> running my processes provided by the environment in which I'm
>>>>>>>>> working
>>>>>>>>> and unfortunately I can't use mpirun.
>>>>>>>>> 2010/4/18 Ralph Castain <rhc_at_[hidden]>:
>>>>>>>>>> Guess I don't understand why you can't use mpirun - all it does is
>>>>>>>>>> start things, provide a means to forward io, etc. It mainly sits there
>>>>>>>>>> quietly without using any cpu unless required to support the job.
>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I know of no
>>>>>>>>>> way to get all these processes into comm_world.
>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> I'd like to dynamically create a group of processes communicating
>>>>>>>>>>> via
>>>>>>>>>>> MPI. Those processes need to be run without mpirun and create
>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do this
>>>>>>>>>>> efficiently?
>>>>>>>>>>> I came up with a solution in which the processes are connecting
>>>>>>>>>>> one by
>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the processes
>>>>>>>>>>> that
>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. This means
>>>>>>>>>>> that
>>>>>>>>>>> when the n-th process wants to connect I need to collect all the
>>>>>>>>>>> n-1
>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about 40
>>>>>>>>>>> processes
>>>>>>>>>>> every subsequent call takes more and more time, which I'd like to
>>>>>>>>>>> avoid.
>>>>>>>>>>> Another problem in this solution is that when I try to connect
>>>>>>>>>>> 66-th
>>>>>>>>>>> process the root of the existing group segfaults on
>>>>>>>>>>> MPI_Comm_accept.
>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works fine for at
>>>>>>>>>>> most
>>>>>>>>>>> 65 processes. Is there any limitation I don't know about?
>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my processes
>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as MPI_COMM_SELF.
>>>>>>>>>>> Is
>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the
>>>>>>>>>>> intracommunicator that I've created?
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>> <client.c><server.c>_______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]