Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Multi-threading with OpenMPI ?
From: Ashika Umanga Umagiliya (aumanga_at_[hidden])
Date: 2009-10-05 22:48:43


Ralph , thank you for your help.

I set "-mca opal_set_max_sys_limits 1" and my "ulimit" us "unlimited" ,
but still I get the errors.
What's happening now is ,for every user request(webservice request) a
new thread is created and in the same thread I spawn processes and these
newly spawned processes do the calculation is parallel.
I think, I have to change the design so that I put the "Requests" in a
queue and execute "parallel job" one at a time ,rather than runing
multiple "parallel jobs" at once.(this might eventually run-out of
system resources ).

Thank you ,
umanga

Ralph Castain wrote:
> Are these threads running for long periods of time? I ask because
> there typically are system limits on the number of pipes any one
> process can open, which is what you appear to be hitting. You can
> check two things (as the error message tells you :-)):
>
> 1. set -mca opal_set_max_sys_limits 1 on your cmd line (or in
> environ). This will tell OMPI to automatically set the system to the
> max allowed values
>
> 2. check "ulimit" to see what you are allowed. You might need to talk
> to you sys admin about upping limits.
>
>
> On Oct 5, 2009, at 1:33 AM, Ashika Umanga Umagiliya wrote:
>
>> Greetings all,
>>
>> First of all thank you all for the help.
>>
>> I tried using locks and still I get following problems :
>>
>> 1) When multiple threads calling MPI_Comm_Spawn (sequentially or in
>> parallel), some spawned processes hang up on its
>> "MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);"
>> method. (I can see list of all spawned processes are stacked in the
>> 'top' command.)
>>
>> 2) Randomly, program (webservice) crashes with the error
>>
>> "[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on
>> number of pipes a process can open was reached in file
>> odls_default_module.c at line 218
>> [umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on
>> number of network connections a process can open was reached in file
>> oob_tcp.c at line 447
>> --------------------------------------------------------------------------
>> Error: system limit exceeded on number of network connections that
>> can be open
>>
>> This can be resolved by setting the mca parameter
>> opal_set_max_sys_limits to 1,
>> increasing your limit descriptor setting (using limit or ulimit
>> commands),
>> or asking the system administrator to increase the system limit.
>> --------------------------------------------------------------------------"
>>
>> Any advices ?
>>
>> Thank you,
>> umanga
>>
>> Richard Treumann wrote:
>>>
>>> MPI_COMM_SELF is one example. The only task it contains is the local
>>> task.
>>>
>>> The other case I had in mind is where there is a master doing all
>>> spawns. Master is launched as an MPI "job" but it has only one task.
>>> In that master, even MPI_COMM_WORLD is what I called a "single task
>>> communicator".
>>>
>>> Because the collective spawn call is "collective: across only one
>>> task in this case, it does not have the same sort of dependency on
>>> what other tasks do.
>>>
>>> I think it is common for a single task master to have responsibility
>>> for all spawns in the kind of model yours sounds like. I did not
>>> study the conversation enough to knew if you are doing all spawn
>>> calls from a "single task communicator" and I was trying to give a
>>> broadly useful explanation.
>>>
>>>
>>> Dick Treumann - MPI Team
>>> IBM Systems & Technology Group
>>> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>>> Tele (845) 433-7846 Fax (845) 433-8363
>>>
>>>
>>> users-bounces_at_[hidden] wrote on 09/25/2009 02:59:04 AM:
>>>
>>> > [image removed]
>>> >
>>> > Re: [OMPI users] Multi-threading with OpenMPI ?
>>> >
>>> > Ashika Umanga Umagiliya
>>> >
>>> > to:
>>> >
>>> > Open MPI Users
>>> >
>>> > 09/25/2009 03:00 AM
>>> >
>>> > Sent by:
>>> >
>>> > users-bounces_at_[hidden]
>>> >
>>> > Please respond to Open MPI Users
>>> >
>>> > Thank you Dick for your detailed reply,
>>> >
>>> > I am sorry, could you explain more what you meant by "unless you are
>>> > calling MPI_Comm_spawn on a single task communicator you would need
>>> > to have a different input communicator for each thread that will
>>> > make an MPI_Comm_spawn call" , i am confused with the term "single
>>> > task communicator"
>>> >
>>> > Best Regards,
>>> > umanga
>>> >
>>> > Richard Treumann wrote:
>>> > It is dangerous to hold a local lock (like a mutex} across a
>>> > blocking MPI call unless you can be 100% sure everything that must
>>> > happen remotely will be completely independent of what is done with
>>> > local locks & communication dependancies on other tasks.
>>> >
>>> > It is likely that a MPI_Comm_spawn call in which the spawning
>>> > communicator is MPI_COMM_SELF would be safe to serialize with a
>>> > mutex. But be careful and do not view this as an approach to making
>>> > MPI applications thread safe in general. Also, unless you are
>>> > calling MPI_Comm_spawn on a single task communicator you would need
>>> > to have a different input communicator for each thread that will
>>> > make an MPI_Comm_spawn call. MPI requires that collective calls on a
>>> > given communicator be made in the same order by all participating
>>> tasks.
>>> >
>>> > If there are two or more tasks making the MPI_Comm_spawn call
>>> > collectively from multiple threads (even with per-thread input
>>> > communicators) then using a local lock this way is pretty sure to
>>> > deadlock at some point. Say task 0 serializes spawning threads as A
>>> > then B and task 1 serializes them as B then A. The job will deadlock
>>> > because task 0 cannot free its lock for thread A until task 1 makes
>>> > the spawn call for thread A as well. That will never happen if task
>>> > 1 is stuck in a lock that will not release until task 0 makes its
>>> > call for thread B.
>>> >
>>> > When you look at the code for a particular task and consider thread
>>> > interactions within the task, the use of the lock looks safe. It is
>>> > only when you consider the dependancies on what other tasks are
>>> > doing that the danger becomes clear. This particular case is pretty
>>> > easy to see but sometime when there is a temptation to hold a local
>>> > mutex across an blocking MPI call, the chain of dependancies that
>>> > can lead to deadlock becomes very hard to predict.
>>> >
>>> > BTW - maybe this is obvious but you also need to protect the logic
>>> > which calls MPI_Thread_init to make sure you do not have a a race in
>>> > which 2 threads each race to test the flag for whether
>>> > MPI_Init_thread has already been called. If two thread do:
>>> > 1) if (MPI_Inited_flag == FALSE) {
>>> > 2) set MPI_Inited_flag
>>> > 3) MPI_Init_thread
>>> > 4) }
>>> > You have a couple race conditions.
>>> > 1) Two threads may both try to call MPI_Iint_thread if one thread
>>> > tests " if (MPI_Inited_flag == FALSE)" while the other is between
>>> > statements 1 & 2.
>>> > 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while
>>> > another thread is between statements 2 and 3, that thread could
>>> > assume MPI_Init_thread is done and make the MPI_Comm_spawn call
>>> > before the thread that is trying to initialize MPI manages to do it.
>>> >
>>> > Dick
>>> >
>>> >
>>> > Dick Treumann - MPI Team
>>> > IBM Systems & Technology Group
>>> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>>> > Tele (845) 433-7846 Fax (845) 433-8363
>>> >
>>> >
>>> > users-bounces_at_[hidden] wrote on 09/17/2009 11:36:48 PM:
>>> >
>>> > > [image removed]
>>> > >
>>> > > Re: [OMPI users] Multi-threading with OpenMPI ?
>>> > >
>>> > > Ralph Castain
>>> > >
>>> > > to:
>>> > >
>>> > > Open MPI Users
>>> > >
>>> > > 09/17/2009 11:37 PM
>>> > >
>>> > > Sent by:
>>> > >
>>> > > users-bounces_at_[hidden]
>>> > >
>>> > > Please respond to Open MPI Users
>>> > >
>>> > > Only thing I can suggest is to place a thread lock around the
>>> call to
>>> > > comm_spawn so that only one thread at a time can execute that
>>> > > function. The call to mpi_init_thread is fine - you just need to
>>> > > explicitly protect the call to comm_spawn.
>>> > >
>>> > >
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users