Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Multi-threading with OpenMPI ?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-10-05 08:57:36


Are these threads running for long periods of time? I ask because
there typically are system limits on the number of pipes any one
process can open, which is what you appear to be hitting. You can
check two things (as the error message tells you :-)):

1. set -mca opal_set_max_sys_limits 1 on your cmd line (or in
environ). This will tell OMPI to automatically set the system to the
max allowed values

2. check "ulimit" to see what you are allowed. You might need to talk
to you sys admin about upping limits.

On Oct 5, 2009, at 1:33 AM, Ashika Umanga Umagiliya wrote:

> Greetings all,
>
> First of all thank you all for the help.
>
> I tried using locks and still I get following problems :
>
> 1) When multiple threads calling MPI_Comm_Spawn (sequentially or in
> parallel), some spawned processes hang up on its
> "MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);"
> method. (I can see list of all spawned processes are stacked in the
> 'top' command.)
>
> 2) Randomly, program (webservice) crashes with the error
>
> "[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on
> number of pipes a process can open was reached in file
> odls_default_module.c at line 218
> [umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on
> number of network connections a process can open was reached in file
> oob_tcp.c at line 447
> --------------------------------------------------------------------------
> Error: system limit exceeded on number of network connections that
> can be open
>
> This can be resolved by setting the mca parameter
> opal_set_max_sys_limits to 1,
> increasing your limit descriptor setting (using limit or ulimit
> commands),
> or asking the system administrator to increase the system limit.
> --------------------------------------------------------------------------"
>
> Any advices ?
>
> Thank you,
> umanga
>
> Richard Treumann wrote:
>>
>> MPI_COMM_SELF is one example. The only task it contains is the
>> local task.
>>
>> The other case I had in mind is where there is a master doing all
>> spawns. Master is launched as an MPI "job" but it has only one
>> task. In that master, even MPI_COMM_WORLD is what I called a
>> "single task communicator".
>>
>> Because the collective spawn call is "collective: across only one
>> task in this case, it does not have the same sort of dependency on
>> what other tasks do.
>>
>> I think it is common for a single task master to have
>> responsibility for all spawns in the kind of model yours sounds
>> like. I did not study the conversation enough to knew if you are
>> doing all spawn calls from a "single task communicator" and I was
>> trying to give a broadly useful explanation.
>>
>>
>> Dick Treumann - MPI Team
>> IBM Systems & Technology Group
>> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>> Tele (845) 433-7846 Fax (845) 433-8363
>>
>>
>> users-bounces_at_[hidden] wrote on 09/25/2009 02:59:04 AM:
>>
>> > [image removed]
>> >
>> > Re: [OMPI users] Multi-threading with OpenMPI ?
>> >
>> > Ashika Umanga Umagiliya
>> >
>> > to:
>> >
>> > Open MPI Users
>> >
>> > 09/25/2009 03:00 AM
>> >
>> > Sent by:
>> >
>> > users-bounces_at_[hidden]
>> >
>> > Please respond to Open MPI Users
>> >
>> > Thank you Dick for your detailed reply,
>> >
>> > I am sorry, could you explain more what you meant by "unless you
>> are
>> > calling MPI_Comm_spawn on a single task communicator you would need
>> > to have a different input communicator for each thread that will
>> > make an MPI_Comm_spawn call" , i am confused with the term "single
>> > task communicator"
>> >
>> > Best Regards,
>> > umanga
>> >
>> > Richard Treumann wrote:
>> > It is dangerous to hold a local lock (like a mutex} across a
>> > blocking MPI call unless you can be 100% sure everything that must
>> > happen remotely will be completely independent of what is done with
>> > local locks & communication dependancies on other tasks.
>> >
>> > It is likely that a MPI_Comm_spawn call in which the spawning
>> > communicator is MPI_COMM_SELF would be safe to serialize with a
>> > mutex. But be careful and do not view this as an approach to making
>> > MPI applications thread safe in general. Also, unless you are
>> > calling MPI_Comm_spawn on a single task communicator you would need
>> > to have a different input communicator for each thread that will
>> > make an MPI_Comm_spawn call. MPI requires that collective calls
>> on a
>> > given communicator be made in the same order by all participating
>> tasks.
>> >
>> > If there are two or more tasks making the MPI_Comm_spawn call
>> > collectively from multiple threads (even with per-thread input
>> > communicators) then using a local lock this way is pretty sure to
>> > deadlock at some point. Say task 0 serializes spawning threads as A
>> > then B and task 1 serializes them as B then A. The job will
>> deadlock
>> > because task 0 cannot free its lock for thread A until task 1 makes
>> > the spawn call for thread A as well. That will never happen if task
>> > 1 is stuck in a lock that will not release until task 0 makes its
>> > call for thread B.
>> >
>> > When you look at the code for a particular task and consider thread
>> > interactions within the task, the use of the lock looks safe. It is
>> > only when you consider the dependancies on what other tasks are
>> > doing that the danger becomes clear. This particular case is pretty
>> > easy to see but sometime when there is a temptation to hold a local
>> > mutex across an blocking MPI call, the chain of dependancies that
>> > can lead to deadlock becomes very hard to predict.
>> >
>> > BTW - maybe this is obvious but you also need to protect the logic
>> > which calls MPI_Thread_init to make sure you do not have a a race
>> in
>> > which 2 threads each race to test the flag for whether
>> > MPI_Init_thread has already been called. If two thread do:
>> > 1) if (MPI_Inited_flag == FALSE) {
>> > 2) set MPI_Inited_flag
>> > 3) MPI_Init_thread
>> > 4) }
>> > You have a couple race conditions.
>> > 1) Two threads may both try to call MPI_Iint_thread if one thread
>> > tests " if (MPI_Inited_flag == FALSE)" while the other is between
>> > statements 1 & 2.
>> > 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while
>> > another thread is between statements 2 and 3, that thread could
>> > assume MPI_Init_thread is done and make the MPI_Comm_spawn call
>> > before the thread that is trying to initialize MPI manages to do
>> it.
>> >
>> > Dick
>> >
>> >
>> > Dick Treumann - MPI Team
>> > IBM Systems & Technology Group
>> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>> > Tele (845) 433-7846 Fax (845) 433-8363
>> >
>> >
>> > users-bounces_at_[hidden] wrote on 09/17/2009 11:36:48 PM:
>> >
>> > > [image removed]
>> > >
>> > > Re: [OMPI users] Multi-threading with OpenMPI ?
>> > >
>> > > Ralph Castain
>> > >
>> > > to:
>> > >
>> > > Open MPI Users
>> > >
>> > > 09/17/2009 11:37 PM
>> > >
>> > > Sent by:
>> > >
>> > > users-bounces_at_[hidden]
>> > >
>> > > Please respond to Open MPI Users
>> > >
>> > > Only thing I can suggest is to place a thread lock around the
>> call to
>> > > comm_spawn so that only one thread at a time can execute that
>> > > function. The call to mpi_init_thread is fine - you just need to
>> > > explicitly protect the call to comm_spawn.
>> > >
>> > >
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users