Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_AllGather null terminator character
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-01-28 06:57:53


On Jan 28, 2012, at 5:22 AM, Gabriele Fatigati wrote:

> I had the same idea so my simple code I have already done calloc and memset ..
>
> The same warning still appear using strncmp that should exclude uninitialized bytes on hostnam_recv_buf :(

Bummer.

> My apologize for being so insistent, but I would understand if there is some bug in MPI_Allgather, strcmp or Valgrind itself.

Understood.

I still think that MPI_Allgather will exactly send the bytes starting at the buffer you specify, regardless of whether they include \0 or not.

I was unable to replicate the valgrind warning on my systems. A few more things to try:

1. Are you using the latest version of Valgrind?

2. (I should have asked this before - sorry!) Are you using InfiniBand to transmit the data across your network? If so, Valgrind might not have visibility on the receive buffers being filled because IB, by its nature, uses OS bypass to fill in receive buffers. Meaning: Valgrind won't "see" the receive buffers getting filled, and therefore will think that they are uninitialized. If you re-run your experiment with TCP and/or shared memory (like I did), you won't see the Valgrind uninitialized warnings.

To avoid these OS-bypass issues, you might try installing Open MPI with --with-valgrind=DIR (DIR = directory where Valgrind is installed -- we need valgrind.h, IIRC). What this does is allow Open MPI to use Valgrind's external tools API to say "don't worry Valgrind, the entire contents of this buffer are initialized" in cases exactly like this.

There is a performance cost to using Valgrind integration, though. So don't make this your production copy of Open MPI.

3. Do a for loop accessing each position of the buffer *before* you send it. Not just up to the \0, but traverse the *entire length* of the buffer and do some meaningless action with each byte. See if Valgrind complains. If it doesn't, you know for certain that the entire source buffer is not the origin of the warning.

4. Similarly, do a loop accessing each position of the received buffer. You can have Valgrind attach a debugger when it runs into issues; with that, you can see exactly which position Valgrind thinks is uninitialized. Compare the value that was sent to the value that was received and ensure that they are the same.

Hope that helps!

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/