With version 1.8 works fine :D
I changed all the Finalize by exit(). Obviously with the processes that
continues "util the end" I put a barrier with a communicator that involves
only this processes.
Maybe in future versions would be a good idea allow users to change the
internal communicator in the MPI_Finalize().
Jeff, thanks for your advice, and sorry for change "processes" by "threads",
it was my mistake.
----- Mensaje original -----
De: "Ralph Castain" <rhc_at_[hidden]>
Para: "Open MPI Users" <users_at_[hidden]>
Enviados: Viernes, 23 de Mayo 2014 21:08:05
Asunto: Re: [OMPI users] MPI_Finalize() maintains load at 100%.
Hmmm...okay, good news and bad news :-)
Good news: this works fine on 1.8, so I'd suggest updating to that release series (either 1.8.1 or the nightly 1.8.2)
Bad news: if one proc is going to exit without calling Finalize, they all need to do so else you will hang in Finalize. The problem is that Finalize invokes a barrier, and some of the procs aren't there any more to participate.
On May 23, 2014, at 12:03 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> I'll check to see - should be working
> On May 23, 2014, at 8:07 AM, IvÃ¡n Cores GonzÃ¡lez <ivan.coresg_at_[hidden]> wrote:
>>> I assume you mean have them exit without calling MPI_Finalize ...
>> Yes, thats my idea, exit some processes while the others continue. I am trying to
>> use the "orte_allowed_exit_without_sync" flag in the next code (note that the code
>> is different):
>> int main( int argc, char *argv )
>> MPI_Init(&argc, &argv);
>> int myid;
>> MPI_Comm_rank(MPI_COMM_WORLD, &myid);
>> if (myid == 0)
>> printf("Exit P0 ...\n");
>> //With "--mca orte_allowed_exit_without_sync 1" this
>> //process should die, but not P1, P2 ... , is ok?
>> //Imagine some important job here
>> printf("Calling MPI_Finalize() ...\n");
>> // Process 0 maintain load at 100%.
>> return 0;
>> and the cmd:
>> mpirun --mca orte_allowed_exit_without_sync 1 -hostfile ./hostfile -np 2 --prefix /share/apps/openmpi/gcc/ib/1.7.2 ./a.out
>> But it does not work, all job fails in the "exit(0)" call. Maybe I don't undertand your response...
>> Sorry for not response in order, I have some problems with my
>> e-mail receiving the Open-MPI mails.
>>> In my codes, I am using MPI_Send and MPI_Recv functions to notify P0 that
>>> every other process have finished their own calculations. Maybe you cal
>>> also use the same method and keep P0 in waiting until it receives some data
>>> from other processes?
>> This solution was my first idea, but I can't do it. I use spawned processes and
>> different communicators for manage "groups" of processes, so the ideal behaviour
>> is that processes finished and died (or at least don't stay at 100% load) when
>> their finish their work. Its a bit hard to explain.
>> ----- Mensaje original -----
>> De: "Ralph Castain" <rhc_at_[hidden]>
>> Para: "Open MPI Users" <users_at_[hidden]>
>> Enviados: Viernes, 23 de Mayo 2014 16:39:34
>> Asunto: Re: [OMPI users] MPI_Finalize() maintains load at 100%.
>> On May 23, 2014, at 7:21 AM, IvÃ¡n Cores GonzÃ¡lez <ivan.coresg_at_[hidden]> wrote:
>>> Hi Ralph,
>>> Thanks for your response.
>>> I see your point, I try to change the algorithm but some processes finish while the others are still calling MPI functions. I can't avoid this behaviour.
>>> The ideal behavior is the processes go to sleep (or don't use the 100% of load) when the MPI_Finalize is called.
>>> For the time being maybe the fastest solution is instert a "manual" sleep before the MPI_Finalize.
>>> Another question, Could be possible kill some MPI processes and avoid that the mpirun fails? Or this behaviuor is impossible?
>> I assume you mean have them exit without calling MPI_Finalize, so they don't block? Technically, yes, though we wouldn't recommend that behavior. You can add "-mca orte_allowed_exit_without_sync 1" to your cmd line (or set the mca param in your environment, etc.) and mpirun won't terminate you if a proc exits without calling MPI_Finalize. We will still, however, terminate the job if (a) a proc dies by signal (e.g., segfaults), or (b) a proc exits with a non-zero status, so you'll still have some protection from hangs.
>>> Ivan Cores
>>> users mailing list
>> users mailing list
>> users mailing list
users mailing list