Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Heap profiling with OpenMPI
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-08-06 05:12:20


Jan,

I'm using the latest of Open MPI compiled with debug turned on, and
valgrind 3.3.0. From your trace it looks like there is a conflict
between two memory managers. I'm not having the same problem as I
disable the Open MPI memory manager on my builds (configure option --
without-memory-manager).

   george.

On Aug 6, 2008, at 9:29 AM, Jan Ploski wrote:

> users-bounces_at_[hidden] schrieb am 08/05/2008 05:51:51 PM:
>
>> Jan,
>>
>> I'm using valgrind with Open MPI on a [very] regular basis and I
>> never
>> had any problems. I usually want to know the execution path on the
>> MPI
>> applications. For this I use:
>> mpirun -np XX valgrind --tool=callgrind -q --log-file=some_file ./
>> my_app
>>
>> I just run your example:
>> mpirun -np 2 -bynode --mca btl tcp,self valgrind --tool=massif -
>> q ./NPmpi -u 4
>> and I got 2 non empty files in the current directory:
>> bosilca_at_dancer:~/NetPIPE_3.6.2$ ls -l massif.out.*
>> -rw------- 1 bosilca bosilca 140451 2008-08-05 11:57 massif.out.
>> 21197
>> -rw------- 1 bosilca bosilca 131471 2008-08-05 11:57 massif.out.
>> 21210
>
> George,
>
> Thanks for the info - which version of OpenMPI, compiler and
> valgrind did
> you try with? I checked in two different clusters with OpenMPI 1.2.4
> compiled with two different versions of the PGI compiler and valgrind
> 3.3.1, with the same bad result. I also noticed that the MPI processes
> despite of producing the expected output do not terminate cleanly. I
> can
> see in the stderr log (for each process):
>
> ==7909== Warning: client syscall munmap tried to modify addresses
> 0xD1968F92A19A72D1-0x34324E6F
> ==7909==
> ==7909== Process terminating with default action of signal 11
> (SIGSEGV)
> ==7909== Access not within mapped region at address 0x8053D8000
> ==7909== at 0x5284996: _int_free (in
> /opt/openmpi-1.2.4/lib/libopen-pal.so.0.0.0)
> ==7909== by 0x52837A7: free (in
> /opt/openmpi-1.2.4/lib/libopen-pal.so.0.0.0)
> ==7909== by 0x593C76A: free_mem (in /lib64/libc-2.4.so)
> ==7909== by 0x593C3E1: __libc_freeres (in /lib64/libc-2.4.so)
> ==7909== by 0x491D31C: _vgnU_freeres (vg_preloaded.c:60)
> ==7909== by 0x587D1C4: exit (in /lib64/libc-2.4.so)
> ==7909== by 0x586815A: (below main) (in /lib64/libc-2.4.so)
>
> That probably explains why my massif.out.* are empty (<200 bytes
> long),
> but why do the processes crash? The same program runs ok with
> valgrind+MVAPICH or with OpenMPI without valgrind in their respective
> clusters. I experience this both with a simple test program and with a
> real application (WRF).
>
> Regards,
> Jan Ploski
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/pkcs7-signature attachment: smime.p7s