Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Finalize() maintains load at 100%.
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-23 15:08:05


Hmmm...okay, good news and bad news :-)

Good news: this works fine on 1.8, so I'd suggest updating to that release series (either 1.8.1 or the nightly 1.8.2)

Bad news: if one proc is going to exit without calling Finalize, they all need to do so else you will hang in Finalize. The problem is that Finalize invokes a barrier, and some of the procs aren't there any more to participate.

On May 23, 2014, at 12:03 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I'll check to see - should be working
>
> On May 23, 2014, at 8:07 AM, Iván Cores González <ivan.coresg_at_[hidden]> wrote:
>
>>> I assume you mean have them exit without calling MPI_Finalize ...
>>
>> Yes, thats my idea, exit some processes while the others continue. I am trying to
>> use the "orte_allowed_exit_without_sync" flag in the next code (note that the code
>> is different):
>>
>> int main( int argc, char *argv[] )
>> {
>> MPI_Init(&argc, &argv);
>>
>> int myid;
>> MPI_Comm_rank(MPI_COMM_WORLD, &myid);
>>
>> if (myid == 0)
>> {
>> printf("Exit P0 ...\n");
>> //With "--mca orte_allowed_exit_without_sync 1" this
>> //process should die, but not P1, P2 ... , is ok?
>> exit(0);
>> }
>>
>> //Imagine some important job here
>> sleep(20);
>>
>> printf("Calling MPI_Finalize() ...\n");
>> // Process 0 maintain load at 100%.
>> MPI_Finalize();
>> return 0;
>> }
>>
>> and the cmd:
>> mpirun --mca orte_allowed_exit_without_sync 1 -hostfile ./hostfile -np 2 --prefix /share/apps/openmpi/gcc/ib/1.7.2 ./a.out
>>
>> But it does not work, all job fails in the "exit(0)" call. Maybe I don't undertand your response...
>>
>>
>> Sorry for not response in order, I have some problems with my
>> e-mail receiving the Open-MPI mails.
>>
>>> In my codes, I am using MPI_Send and MPI_Recv functions to notify P0 that
>>> every other process have finished their own calculations. Maybe you cal
>>> also use the same method and keep P0 in waiting until it receives some data
>>> from other processes?
>>
>> This solution was my first idea, but I can't do it. I use spawned processes and
>> different communicators for manage "groups" of processes, so the ideal behaviour
>> is that processes finished and died (or at least don't stay at 100% load) when
>> their finish their work. Its a bit hard to explain.
>>
>>
>>
>>
>> ----- Mensaje original -----
>> De: "Ralph Castain" <rhc_at_[hidden]>
>> Para: "Open MPI Users" <users_at_[hidden]>
>> Enviados: Viernes, 23 de Mayo 2014 16:39:34
>> Asunto: Re: [OMPI users] MPI_Finalize() maintains load at 100%.
>>
>>
>> On May 23, 2014, at 7:21 AM, Iván Cores González <ivan.coresg_at_[hidden]> wrote:
>>
>>> Hi Ralph,
>>> Thanks for your response.
>>> I see your point, I try to change the algorithm but some processes finish while the others are still calling MPI functions. I can't avoid this behaviour.
>>> The ideal behavior is the processes go to sleep (or don't use the 100% of load) when the MPI_Finalize is called.
>>>
>>> For the time being maybe the fastest solution is instert a "manual" sleep before the MPI_Finalize.
>>>
>>> Another question, Could be possible kill some MPI processes and avoid that the mpirun fails? Or this behaviuor is impossible?
>>
>> I assume you mean have them exit without calling MPI_Finalize, so they don't block? Technically, yes, though we wouldn't recommend that behavior. You can add "-mca orte_allowed_exit_without_sync 1" to your cmd line (or set the mca param in your environment, etc.) and mpirun won't terminate you if a proc exits without calling MPI_Finalize. We will still, however, terminate the job if (a) a proc dies by signal (e.g., segfaults), or (b) a proc exits with a non-zero status, so you'll still have some protection from hangs.
>>
>>>
>>> Thanks,
>>> Ivan Cores
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>