Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Process size
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2008-05-30 09:33:15


Leonardo,

The CRCP 'coord' component implements the bookmark exchange. I store
the message signatures for the bookmark exchange. Since I am
implementing this above the point-to-point stack in Open MPI (PML) I
need to keep track of this message information to implement post-
checkpoint resolution of drained messages.

After a successful checkpoint operation I should be able to free the
memory for most of the messages, excluding those that were drained
during the checkpoint operation but not fully matched. Unfortunately
when I looked back at the code I noticed that I was *not* freeing any
memory, but continuing to append messages per usual. This works
correctly, but becomes a resource and performance problem fairly
quickly for large numbers of messages.

The re-work of the 'coord' component that I am currently working on
will be more careful with memory. I'll let you know when the new
component is made available.

Cheers,
Josh

On May 30, 2008, at 8:09 AM, Leonardo Fialho wrote:

> Josh,
>
> Some time ago I was studying CRCP component, I´m not sure, but I
> remember that this component is used for bookmark exchange. You store
> these informations exactly for this (bookmark exchange)? After a
> successfully checkpoint operation you can free this memory?
>
> Thanks,
> Leonardo
>
> Josh Hursey escribió:
>> Leonardo,
>>
>> You are exactly correct. The CRCP module/component will grow the
>> application size probably for every message that you send or receive.
>> This is because the CRCP component tracks the signature {data_size,
>> tag, communicator, peer} (*not* the contents of the message) of every
>> message sent/received.
>>
>> I have in development some fixes for the CRCP component to make it
>> behave a bit better for large numbers of messages, and as a result
>> will also help control the number of memory allocations needed by
>> this
>> component. Unfortunately it is not 100% ready for public use at the
>> moment, but hopefully soon.
>>
>> As an aside: to clearly see the effect of turning the CRCP component
>> on/off at runtime try the two commands below:
>> Without CRCP:
>> shell$ mpirun -np 2 -am ft-enable-cr -mca crcp none simple-ping
>> 20 1
>> With CRCP:
>> shell$ mpirun -np 2 -am ft-enable-cr simple-ping 20 1
>>
>> -- Josh
>>
>> On May 29, 2008, at 7:54 AM, Leonardo Fialho wrote:
>>
>>
>>> Hi All,
>>>
>>> I made some tests with a dummy "ping" application. Some memory
>>> problems occurred. On these tests I obtained the following results:
>>>
>>> 1) OpenMPI (without FT):
>>> - delaying 1 second to send token to other node: orted and
>>> application size stable;
>>> - delaying 0 seconds to send token to other node: orted and
>>> application size stable.
>>>
>>> 2) OpenMPI (with CRCP FT):
>>> - delaying 1 second to send token to other node: orted stable and
>>> application size grow in the first seconds and establish;
>>> - delaying 0 seconds to send token to other node: orted stable and
>>> application size growing all the time.
>>>
>>> I think that it is something in the CRCP module/component...
>>>
>>> Thanks,
>>>
>>> --
>>> Leonardo Fialho
>>> Computer Architecture and Operating Systems Department - CAOS
>>> Universidad Autonoma de Barcelona - UAB
>>> ETSE, Edifcio Q, QC/3088
>>> http://www.caos.uab.es
>>> Phone: +34-93-581-2888
>>> Fax: +34-93-581-2478
>>>
>>> #include </softs/openmpi/include/mpi.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>>
>>> int main (int argc, char *argv[]) {
>>> double time_end, time_start;
>>> int count, rank, fim, x;
>>> char buffer[5] = "test!";
>>> MPI_Status status;
>>>
>>> if (3 > argc) {
>>> printf("\n Insuficient arguments (%d)\n\n ping <times>
>>> <delay>\n\n", argc);
>>> exit(1);
>>> }
>>>
>>> if (MPI_Init(&argc, &argv) == MPI_SUCCESS) {
>>> time_start = MPI_Wtime();
>>> MPI_Comm_size (MPI_COMM_WORLD, &count);
>>> MPI_Comm_rank (MPI_COMM_WORLD, &rank );
>>> for (fim = 1; fim <= atoi(argv[1]); fim++) {
>>> if (rank == 0) {
>>> printf("(%d) sent token to (%d)\n", rank, rank+1);
>>> fflush(stdout);
>>> sleep(atoi(argv[2]));
>>> MPI_Send(buffer, 5, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
>>> MPI_Recv(buffer, 5, MPI_CHAR, count-1, 1,
>>> MPI_COMM_WORLD, &status);
>>> } else {
>>> MPI_Recv(buffer, 5, MPI_CHAR, rank-1, 1,
>>> MPI_COMM_WORLD, &status);
>>> printf("(%d) sent token to (%d)\n", rank,
>>> (rank==(count-1) ? 0 : rank+1));
>>> fflush(stdout);
>>> sleep(atoi(argv[2]));
>>> MPI_Send(buffer, 5, MPI_CHAR, (rank==(count-1) ? 0 :
>>> rank+1), 1, MPI_COMM_WORLD);
>>> }
>>> }
>>> }
>>>
>>> time_end = MPI_Wtime();
>>> MPI_Finalize();
>>>
>>> if (rank == 0) {
>>> printf("%f\n", time_end - time_start);
>>> }
>>>
>>> return 0;
>>> }
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> http://www.caos.uab.es
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users