Ralph , thank you for your help.

I set "-mca opal_set_max_sys_limits 1" and my "ulimit" us "unlimited" , but still I get the errors.
What's happening now is ,for every user request(webservice request)  a new thread is created and in the same thread I spawn processes and these newly spawned processes do the calculation is parallel.
I think, I have to change the design so that I put the "Requests" in a queue and execute "parallel job" one at a time ,rather than runing multiple "parallel jobs" at once.(this might eventually run-out of system resources ).

Thank you ,
umanga




Ralph Castain wrote:
Are these threads running for long periods of time? I ask because there typically are system limits on the number of pipes any one process can open, which is what you appear to be hitting. You can check two things (as the error message tells you :-)):

1. set -mca opal_set_max_sys_limits 1 on your cmd line (or in environ). This will tell OMPI to automatically set the system to the max allowed values

2. check "ulimit" to see what you are allowed. You might need to talk to you sys admin about upping limits.


On Oct 5, 2009, at 1:33 AM, Ashika Umanga Umagiliya wrote:

Greetings all,

First of all thank you all for the help.

I tried using locks and still I get following problems :

1) When multiple threads calling MPI_Comm_Spawn (sequentially or in parallel), some spawned processes hang up on its
"MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);"
method. (I can see list of all spawned processes are stacked in the 'top' command.)

2) Randomly, program (webservice) crashes with the error

"[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 218
[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447
--------------------------------------------------------------------------
Error: system limit exceeded on number of network connections that can be open

This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimit commands),
or asking the system administrator to increase the system limit.
--------------------------------------------------------------------------"

Any advices ?

Thank you,
umanga

Richard Treumann wrote:

MPI_COMM_SELF is one example. The only task it contains is the local task.

The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator".

Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do.

I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation.


Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


users-bounces@open-mpi.org wrote on 09/25/2009 02:59:04 AM:

> [image removed]

>
> Re: [OMPI users] Multi-threading with OpenMPI ?

>
> Ashika Umanga Umagiliya

>
> to:

>
> Open MPI Users

>
> 09/25/2009 03:00 AM

>
> Sent by:

>
> users-bounces@open-mpi.org

>
> Please respond to Open MPI Users

>
> Thank you Dick for your detailed reply,
>
> I am sorry, could you explain more what you meant by "unless you are
> calling MPI_Comm_spawn on a single task communicator you would need
> to have a different input communicator for each thread that will
> make an MPI_Comm_spawn call" , i am confused with the term "single
> task communicator"
>
> Best Regards,
> umanga
>
> Richard Treumann wrote:

> It is dangerous to hold a local lock (like a mutex} across a
> blocking MPI call unless you can be 100% sure everything that must
> happen remotely will be completely independent of what is done with
> local locks & communication dependancies on other tasks.
>
> It is likely that a MPI_Comm_spawn call in which the spawning
> communicator is MPI_COMM_SELF would be safe to serialize with a
> mutex. But be careful and do not view this as an approach to making
> MPI applications thread safe in general. Also, unless you are
> calling MPI_Comm_spawn on a single task communicator you would need
> to have a different input communicator for each thread that will
> make an MPI_Comm_spawn call. MPI requires that collective calls on a
> given communicator be made in the same order by all participating tasks.
>
> If there are two or more tasks making the MPI_Comm_spawn call
> collectively from multiple threads (even with per-thread input
> communicators) then using a local lock this way is pretty sure to
> deadlock at some point. Say task 0 serializes spawning threads as A
> then B and task 1 serializes them as B then A. The job will deadlock
> because task 0 cannot free its lock for thread A until task 1 makes
> the spawn call for thread A as well. That will never happen if task
> 1 is stuck in a lock that will not release until task 0 makes its
> call for thread B.
>
> When you look at the code for a particular task and consider thread
> interactions within the task, the use of the lock looks safe. It is
> only when you consider the dependancies on what other tasks are
> doing that the danger becomes clear. This particular case is pretty
> easy to see but sometime when there is a temptation to hold a local
> mutex across an blocking MPI call, the chain of dependancies that
> can lead to deadlock becomes very hard to predict.
>
> BTW - maybe this is obvious but you also need to protect the logic
> which calls MPI_Thread_init to make sure you do not have a a race in
> which 2 threads each race to test the flag for whether
> MPI_Init_thread has already been called. If two thread do:
> 1) if (MPI_Inited_flag == FALSE) {
> 2) set MPI_Inited_flag
> 3) MPI_Init_thread
> 4) }
> You have a couple race conditions.
> 1) Two threads may both try to call MPI_Iint_thread if one thread
> tests " if (MPI_Inited_flag == FALSE)" while the other is between
> statements 1 & 2.
> 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while
> another thread is between statements 2 and 3, that thread could
> assume MPI_Init_thread is done and make the MPI_Comm_spawn call
> before the thread that is trying to initialize MPI manages to do it.
>
> Dick
>
>
> Dick Treumann - MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
>
>
> users-bounces@open-mpi.org wrote on 09/17/2009 11:36:48 PM:
>
> > [image removed]
> >
> > Re: [OMPI users] Multi-threading with OpenMPI ?
> >
> > Ralph Castain
> >
> > to:
> >
> > Open MPI Users
> >
> > 09/17/2009 11:37 PM
> >
> > Sent by:
> >
> > users-bounces@open-mpi.org
> >
> > Please respond to Open MPI Users
> >
> > Only thing I can suggest is to place a thread lock around the call to  
> > comm_spawn so that only one thread at a time can execute that  
> > function. The call to mpi_init_thread is fine - you just need to  
> > explicitly protect the call to comm_spawn.
> >
> >

>
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users