Greetings all,

First of all thank you all for the help.

I tried using locks and still I get following problems :

1) When multiple threads calling MPI_Comm_Spawn (sequentially or in parallel), some spawned processes hang up on its
"MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);"
method. (I can see list of all spawned processes are stacked in the 'top' command.)

2) Randomly, program (webservice) crashes with the error

"[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 218
[umanga:06488] [[4594,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file oob_tcp.c at line 447
--------------------------------------------------------------------------
Error: system limit exceeded on number of network connections that can be open

This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimit commands),
or asking the system administrator to increase the system limit.
--------------------------------------------------------------------------"

Any advices ?

Thank you,
umanga

Richard Treumann wrote:

MPI_COMM_SELF is one example. The only task it contains is the local task.

The other case I had in mind is where there is a master doing all spawns. Master is launched as an MPI "job" but it has only one task. In that master, even MPI_COMM_WORLD is what I called a "single task communicator".

Because the collective spawn call is "collective: across only one task in this case, it does not have the same sort of dependency on what other tasks do.

I think it is common for a single task master to have responsibility for all spawns in the kind of model yours sounds like. I did not study the conversation enough to knew if you are doing all spawn calls from a "single task communicator" and I was trying to give a broadly useful explanation.


Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


users-bounces@open-mpi.org wrote on 09/25/2009 02:59:04 AM:

> [image removed]

>
> Re: [OMPI users] Multi-threading with OpenMPI ?

>
> Ashika Umanga Umagiliya

>
> to:

>
> Open MPI Users

>
> 09/25/2009 03:00 AM

>
> Sent by:

>
> users-bounces@open-mpi.org

>
> Please respond to Open MPI Users

>
> Thank you Dick for your detailed reply,
>
> I am sorry, could you explain more what you meant by "unless you are
> calling MPI_Comm_spawn on a single task communicator you would need
> to have a different input communicator for each thread that will
> make an MPI_Comm_spawn call" , i am confused with the term "single
> task communicator"
>
> Best Regards,
> umanga
>
> Richard Treumann wrote:

> It is dangerous to hold a local lock (like a mutex} across a
> blocking MPI call unless you can be 100% sure everything that must
> happen remotely will be completely independent of what is done with
> local locks & communication dependancies on other tasks.
>
> It is likely that a MPI_Comm_spawn call in which the spawning
> communicator is MPI_COMM_SELF would be safe to serialize with a
> mutex. But be careful and do not view this as an approach to making
> MPI applications thread safe in general. Also, unless you are
> calling MPI_Comm_spawn on a single task communicator you would need
> to have a different input communicator for each thread that will
> make an MPI_Comm_spawn call. MPI requires that collective calls on a
> given communicator be made in the same order by all participating tasks.
>
> If there are two or more tasks making the MPI_Comm_spawn call
> collectively from multiple threads (even with per-thread input
> communicators) then using a local lock this way is pretty sure to
> deadlock at some point. Say task 0 serializes spawning threads as A
> then B and task 1 serializes them as B then A. The job will deadlock
> because task 0 cannot free its lock for thread A until task 1 makes
> the spawn call for thread A as well. That will never happen if task
> 1 is stuck in a lock that will not release until task 0 makes its
> call for thread B.
>
> When you look at the code for a particular task and consider thread
> interactions within the task, the use of the lock looks safe. It is
> only when you consider the dependancies on what other tasks are
> doing that the danger becomes clear. This particular case is pretty
> easy to see but sometime when there is a temptation to hold a local
> mutex across an blocking MPI call, the chain of dependancies that
> can lead to deadlock becomes very hard to predict.
>
> BTW - maybe this is obvious but you also need to protect the logic
> which calls MPI_Thread_init to make sure you do not have a a race in
> which 2 threads each race to test the flag for whether
> MPI_Init_thread has already been called. If two thread do:
> 1) if (MPI_Inited_flag == FALSE) {
> 2) set MPI_Inited_flag
> 3) MPI_Init_thread
> 4) }
> You have a couple race conditions.
> 1) Two threads may both try to call MPI_Iint_thread if one thread
> tests " if (MPI_Inited_flag == FALSE)" while the other is between
> statements 1 & 2.
> 2) If some thread tests "if (MPI_Inited_flag == FALSE)" while
> another thread is between statements 2 and 3, that thread could
> assume MPI_Init_thread is done and make the MPI_Comm_spawn call
> before the thread that is trying to initialize MPI manages to do it.
>
> Dick
>
>
> Dick Treumann - MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
>
>
> users-bounces@open-mpi.org wrote on 09/17/2009 11:36:48 PM:
>
> > [image removed]
> >
> > Re: [OMPI users] Multi-threading with OpenMPI ?
> >
> > Ralph Castain
> >
> > to:
> >
> > Open MPI Users
> >
> > 09/17/2009 11:37 PM
> >
> > Sent by:
> >
> > users-bounces@open-mpi.org
> >
> > Please respond to Open MPI Users
> >
> > Only thing I can suggest is to place a thread lock around the call to  
> > comm_spawn so that only one thread at a time can execute that  
> > function. The call to mpi_init_thread is fine - you just need to  
> > explicitly protect the call to comm_spawn.
> >
> >

>
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users