Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Process size
From: Leonardo Fialho (lfialho_at_[hidden])
Date: 2008-05-30 08:09:33


Josh,

Some time ago I was studying CRCP component, I´m not sure, but I
remember that this component is used for bookmark exchange. You store
these informations exactly for this (bookmark exchange)? After a
successfully checkpoint operation you can free this memory?

Thanks,
Leonardo

Josh Hursey escribió:
> Leonardo,
>
> You are exactly correct. The CRCP module/component will grow the
> application size probably for every message that you send or receive.
> This is because the CRCP component tracks the signature {data_size,
> tag, communicator, peer} (*not* the contents of the message) of every
> message sent/received.
>
> I have in development some fixes for the CRCP component to make it
> behave a bit better for large numbers of messages, and as a result
> will also help control the number of memory allocations needed by this
> component. Unfortunately it is not 100% ready for public use at the
> moment, but hopefully soon.
>
> As an aside: to clearly see the effect of turning the CRCP component
> on/off at runtime try the two commands below:
> Without CRCP:
> shell$ mpirun -np 2 -am ft-enable-cr -mca crcp none simple-ping 20 1
> With CRCP:
> shell$ mpirun -np 2 -am ft-enable-cr simple-ping 20 1
>
> -- Josh
>
> On May 29, 2008, at 7:54 AM, Leonardo Fialho wrote:
>
>
>> Hi All,
>>
>> I made some tests with a dummy "ping" application. Some memory
>> problems occurred. On these tests I obtained the following results:
>>
>> 1) OpenMPI (without FT):
>> - delaying 1 second to send token to other node: orted and
>> application size stable;
>> - delaying 0 seconds to send token to other node: orted and
>> application size stable.
>>
>> 2) OpenMPI (with CRCP FT):
>> - delaying 1 second to send token to other node: orted stable and
>> application size grow in the first seconds and establish;
>> - delaying 0 seconds to send token to other node: orted stable and
>> application size growing all the time.
>>
>> I think that it is something in the CRCP module/component...
>>
>> Thanks,
>>
>> --
>> Leonardo Fialho
>> Computer Architecture and Operating Systems Department - CAOS
>> Universidad Autonoma de Barcelona - UAB
>> ETSE, Edifcio Q, QC/3088
>> http://www.caos.uab.es
>> Phone: +34-93-581-2888
>> Fax: +34-93-581-2478
>>
>> #include </softs/openmpi/include/mpi.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int main (int argc, char *argv[]) {
>> double time_end, time_start;
>> int count, rank, fim, x;
>> char buffer[5] = "test!";
>> MPI_Status status;
>>
>> if (3 > argc) {
>> printf("\n Insuficient arguments (%d)\n\n ping <times>
>> <delay>\n\n", argc);
>> exit(1);
>> }
>>
>> if (MPI_Init(&argc, &argv) == MPI_SUCCESS) {
>> time_start = MPI_Wtime();
>> MPI_Comm_size (MPI_COMM_WORLD, &count);
>> MPI_Comm_rank (MPI_COMM_WORLD, &rank );
>> for (fim = 1; fim <= atoi(argv[1]); fim++) {
>> if (rank == 0) {
>> printf("(%d) sent token to (%d)\n", rank, rank+1);
>> fflush(stdout);
>> sleep(atoi(argv[2]));
>> MPI_Send(buffer, 5, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
>> MPI_Recv(buffer, 5, MPI_CHAR, count-1, 1,
>> MPI_COMM_WORLD, &status);
>> } else {
>> MPI_Recv(buffer, 5, MPI_CHAR, rank-1, 1,
>> MPI_COMM_WORLD, &status);
>> printf("(%d) sent token to (%d)\n", rank,
>> (rank==(count-1) ? 0 : rank+1));
>> fflush(stdout);
>> sleep(atoi(argv[2]));
>> MPI_Send(buffer, 5, MPI_CHAR, (rank==(count-1) ? 0 :
>> rank+1), 1, MPI_COMM_WORLD);
>> }
>> }
>> }
>>
>> time_end = MPI_Wtime();
>> MPI_Finalize();
>>
>> if (rank == 0) {
>> printf("%f\n", time_end - time_start);
>> }
>>
>> return 0;
>> }
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478