Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_AllGather null terminator character
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-01-27 11:58:57


Ah, I have an idea what might be happening here: I believe that valgrind is actually pretty smart.

If you have a buffer of size 128, and gethostname() only fills in, say, the first 32 bytes (including the \0), the other 128-32=96 bytes are uninitialized. You can MPI_Allgather these, in which case those 96 uninitialized bytes will be copied over to the hostname_recv_buf buffer.

For each rank, valgrind can actually track the local memcpy from local_hostname to hostnam_recv_buf[rank * MAX_LEN_SIZE], and it knows that those 96 bytes are still uninitialized.

So when you go to strcmp them later, valgrind says "ah ha! those are uninitialized!"

Meaning: I think that in some cases, valgrind is actually tracking the memcpy of uninitialized bytes and then alerting you later when you access those secondary uninitialized bytes.

If I'm right, you can memset the local_hostname buffer (or use calloc), and then valgrind warnings will go away.

On Jan 27, 2012, at 8:21 AM, Gabriele Fatigati wrote:

> Hi Jeff,
>
> yes, very stupid bug in a code, but also with the correction the problem with Valgrind in strcmp remains:
>
> ==21779== Conditional jump or move depends on uninitialised value(s)
> ==21779== at 0x4A0898C: strcmp (mc_replace_strmem.c:711)
> ==21779== by 0x400BA8: main (all_gather.c:28)
> ==21779==
> ==21779== Conditional jump or move depends on uninitialised value(s)
> ==21779== at 0x4A0899A: strcmp (mc_replace_strmem.c:711)
> ==21779== by 0x400BA8: main (all_gather.c:28)
> ==21779==
> ==21779== Conditional jump or move depends on uninitialised value(s)
> ==21779== at 0x4A089BA: strcmp (mc_replace_strmem.c:711)
> ==21779== by 0x400BA8: main (all_gather.c:28)
>
>
> Do you have the same warning with Valgrind? Localhost name is something like "node343" "node344" and so on.
>
>
> 2012/1/27 Jeff Squyres <jsquyres_at_[hidden]>
> I see one problem:
>
> gethostname(local_hostname, sizeof(local_hostname));
>
> That should be:
>
> gethostname(local_hostname, max_name_len);
>
> because sizeof(local_hostname) will be sizeof(void*).
>
> But if that's what you were intending, just to simulate a small hostname buffer, then be aware that gethostname() will not put a \0 after the string, because it'll copy in sizeof(local_hostname) characters and then stop.
>
> Specifically, the man page on OS X says:
>
> The gethostname() function returns the standard host name for the current
> processor, as previously set by sethostname(). The namelen argument
> specifies the size of the name array. The returned name is null-termi-
> nated, unless insufficient space is provided.
>
> Hence, MPI is transmitting the entire 255 characters in your source array (regardless of content -- MPI is not looking for \0's; you gave it the explicit length of the buffer), but if they weren't filled with \0's, then the receiver's printf will have problems handling it.
>
>
>
> On Jan 27, 2012, at 4:03 AM, Gabriele Fatigati wrote:
>
> > Sorry,
> >
> > this is the right code.
> >
> > 2012/1/27 Gabriele Fatigati <g.fatigati_at_[hidden]>
> > Hi Jeff,
> >
> > The problem is when I use strcmp on ALLGather buffer and Valgrind that raise a warning.
> >
> > Please check if the attached code is right, where size(local_hostname) is very small.
> >
> > Valgrind is used as:
> >
> > mpirun valgrind --leak-check=full --tool=memcheck ./all_gather
> >
> > and openmpi/1.4.4 compiled with "-O0 -g"
> >
> > Thanks!
> >
> > 2012/1/26 Jeff Squyres <jsquyres_at_[hidden]>
> > I'm not sure what you're asking.
> >
> > The entire contents of hostname[] will be sent -- from position 0 to position (MAX_STRING_LEN-1). If there's a \0 in there, it will be sent. If the \0 occurs after that, then it won't.
> >
> > Be aware that get_hostname(buf, size) will not put a \0 in the buffer if the hostname is exactly "size" bytes. So you might want to double check that your get_hostname() is returning a \0-terminated string.
> >
> > Does that make sense?
> >
> > Here's a sample I wrote to verify this:
> >
> > #include <stdio.h>
> > #include <string.h>
> > #include <mpi.h>
> > #include <stdlib.h>
> >
> > #define MAX_LEN 64
> >
> > static void where_null(char *ptr, int len, int rank)
> > {
> > int i;
> >
> > for (i = 0; i < len; ++i) {
> > if ('\0' == ptr[i]) {
> > printf("Rank %d: Null found at position %d (string: %s)\n",
> > rank, i, ptr);
> > return;
> > }
> > }
> >
> > printf("Rank %d: Null not found! (string: ", rank);
> > for (i = 0; i < len; ++i) putc(ptr[i], stdout);
> > putc('\n', stdout);
> > }
> >
> > int main()
> > {
> > int i;
> > char hostname[MAX_LEN];
> > char *hostname_recv_buf;
> > int rank, size;
> >
> > MPI_Init(NULL, NULL);
> > MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> > MPI_Comm_size(MPI_COMM_WORLD, &size);
> >
> > gethostname(hostname, MAX_LEN - 1);
> > where_null(hostname, MAX_LEN, rank);
> >
> > hostname_recv_buf = calloc(size * (MAX_LEN), (sizeof(char)));
> > MPI_Allgather(hostname, MAX_LEN, MPI_CHAR,
> > hostname_recv_buf, MAX_LEN, MPI_CHAR, MPI_COMM_WORLD);
> > for (i = 0; i < size; ++i) {
> > where_null(hostname_recv_buf + i * MAX_LEN, MAX_LEN, rank);
> > }
> >
> > MPI_Finalize();
> > return 0;
> > }
> >
> >
> >
> > On Jan 13, 2012, at 2:32 AM, Gabriele Fatigati wrote:
> >
> > > Dear OpenMPI,
> > >
> > > using MPI_Allgather with MPI_CHAR type, I have a doubt about null-terminated character. Imaging I want to spawn node names where my program is running on:
> > >
> > >
> > > ----------------------------------------
> > >
> > > char hostname[MAX_LEN];
> > >
> > > char* hostname_recv_buf=(char*)calloc(num_procs*(MAX_STRING_LEN),(sizeof(char)));
> > >
> > > MPI_Allgather(hostname, MAX_STRING_LEN, MPI_CHAR, hostname_recv_buf, MAX_STRING_LEN, MPI_CHAR, MPI_COMM_WORLD);
> > >
> > > ----------------------------------------
> > >
> > >
> > > Now, is the null-terminated character of each local string included? Or I have to send and receive in MPI_Allgather MAX_STRING_LEN+1 elements?
> > >
> > > Using Valgrind, in a subsequent simple strcmp:
> > >
> > > for( i= 0; i< num_procs; i++){
> > > if(strcmp(&hostname_recv_buf[MAX_STRING_LEN*i], local_hostname)==0){
> > > ... doing something....
> > > }
> > > }
> > >
> > > raise a warning:
> > >
> > > Conditional jump or move depends on uninitialised value(s)
> > > ==19931== at 0x4A06E5C: strcmp (mc_replace_strmem.c:412)
> > >
> > > The same warning is not present if I use MAX_STRING_LEN+1 in MPI_Allgather.
> > >
> > >
> > > Thanks in forward.
> > >
> > > --
> > > Ing. Gabriele Fatigati
> > >
> > > HPC specialist
> > >
> > > SuperComputing Applications and Innovation Department
> > >
> > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> > >
> > > www.cineca.it Tel: +39 051 6171722
> > >
> > > g.fatigati [AT] cineca.it
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > Ing. Gabriele Fatigati
> >
> > HPC specialist
> >
> > SuperComputing Applications and Innovation Department
> >
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >
> > www.cineca.it Tel: +39 051 6171722
> >
> > g.fatigati [AT] cineca.it
> >
> >
> >
> > --
> > Ing. Gabriele Fatigati
> >
> > HPC specialist
> >
> > SuperComputing Applications and Innovation Department
> >
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >
> > www.cineca.it Tel: +39 051 6171722
> >
> > g.fatigati [AT] cineca.it
> > <all_gather.c>_______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
> <all_gather.c>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/