Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Waitany segfaults or (maybe) hangs
From: Francesco Salvadore (francescosalvadore_at_[hidden])
Date: 2011-10-20 13:00:46


Dear Jeff,

thanks for replying and for providing MPI implementation details.  As you say, the possible problem is a subtle memory bug.

In our code, MPI communications are limited to a few subroutines named cutman_**** and sharing a similar structure involving a possibile large number (1000 or even more) of non blocking send and receive. Segfaults always occur during execution of cutman_q subroutine.
Using openib, valgrind warns about all cutman_**** subroutine, while using TCP only cutman_v gives "uninitialised value" problems.

As an additional information, we tested the code using different clusters employing openib protocol and the run were always ok, i.e. we see the problem only in one cluster (Opteron Barcelona with InfiniBand: Mellanox Technologies MT25204), while anything seems ok with another cluster (Intel Westmere with  Network controller: Mellanox Technologies MT26438)

best regards, 
Francesco

----- Original Message -----
From: Jeff Squyres <jsquyres_at_[hidden]>
To: Francesco Salvadore <francescosalvadore_at_[hidden]>; Open MPI Users <users_at_[hidden]>
Cc:
Sent: Thursday, October 20, 2011 2:26 PM
Subject: Re: [OMPI users] MPI_Waitany segfaults or (maybe) hangs

Sorry for the delay in replying. 

Unfortunately, the "uninitialized values" kinds of warnings from valgrind are to be expected when using the OFED stack.  Specifically, a bunch of memory in an OMPI process comes directly from OS-bypass kinds of mechanisms, which effectively translates into valgrind-bypass, too.  Hence, even though the memory *has* been initialized, valgrind didn't "see" it get initialized, so it complains.  :-\

Running with TCP should give much more predictable valgrind results, but there are still some tolerable valgrind warnings that we don't care about.  Specifically, when we write a struct down a file descriptor, sometimes there's an alignment "hole" (e.g., a 2 byte short followed by a 2 byte hole followed by a 4 byte int) that wasn't initialized.  We don't care if such holes are uninitialized.

You said that the program runs correctly with TCP but not with openib.  That could well be explained if there is some subtle memory bug somewhere; the openib and TCP underlying drivers are quite different from each other.  It is very possible that openib interacts in such a way that causes the real bug to be fatal, but TCP interacts with it in a different way that does not cause it to be fatal.

Do the TCP valgrind results show anything illuminating?

On Oct 14, 2011, at 10:47 AM, Francesco Salvadore wrote:

> Dear MPI users,
>
> using Valgrind I found that the possibile error (which leads to segfault or hanging) comes from:
>
>
> ==10334== Conditional jump or move depends on uninitialised value(s)
> ==10334==    at 0xB150740: btl_openib_handle_incoming (btl_openib_component.c:2888)
> ==10334==    by 0xB1525A2: handle_wc (btl_openib_component.c:3189)
> ==10334==    by 0xB150390: btl_openib_component_progress (btl_openib_component.c:3462)
> ==10334==    by 0x581DDD6: opal_progress (opal_progress.c:207)
> ==10334==    by 0x52A75DE: ompi_request_default_wait_any (req_wait.c:154)
> ==10334==    by 0x52ED449: PMPI_Waitany (pwaitany.c:70)
> ==10334==    by 0x50541BF: MPI_WAITANY (pwaitany_f.c:86)
> ==10334==    by 0x4ECCC1: mpiwaitany_ (parallelutils.f:1374)
> ==10334==    by 0x4ECB18: waitanymessages_ (parallelutils.f:1295)
> ==10334==    by 0x484249: cutman_v_ (grid.f:490)
> ==10334==    by 0x40DE62: MAIN__ (cosa.f:379)
> ==10334==    by 0x40BEFB: main (in /work/ady/fsalvado/CAMPOBASSO/CASPUR_MPI/4_MPI/crashtest-valgrind/cosa.mpi)
> ==10334==
> ==10334== Use of uninitialised value of size 8
> ==10334==    at 0xB150764: btl_openib_handle_incoming (btl_openib_component.c:2892)
> ==10334==    by 0xB1525A2: handle_wc (btl_openib_component.c:3189)
> ==10334==    by 0xB150390: btl_openib_component_progress (btl_openib_component.c:3462)
> ==10334==    by 0x581DDD6: opal_progress (opal_progress.c:207)
> ==10334==    by 0x52A75DE: ompi_request_default_wait_any (req_wait.c:154)
> ==10334==    by 0x52ED449: PMPI_Waitany (pwaitany.c:70)
> ==10334==    by 0x50541BF: MPI_WAITANY (pwaitany_f.c:86)
> ==10334==    by 0x4ECCC1: mpiwaitany_ (parallelutils.f:1374)
> ==10334==    by 0x4ECB18: waitanymessages_ (parallelutils.f:1295)
> ==10334==    by 0x484249: cutman_v_ (grid.f:490)
> ==10334==    by 0x40DE62: MAIN__ (cosa.f:379)
> ==10334==    by 0x40BEFB: main (in /work/ady/fsalvado/CAMPOBASSO/CASPUR_MPI/4_MPI/crashtest-valgrind/cosa.mpi)
>
> valgrind complains even without using eager_rdma (while the code seems to work in such a case) but complains much less using tcp/ip. there are many other valgrind warning after these and I can send the complete valgrind output if needed.
>
> the messages recall something from another thread
>
> http://www.open-mpi.org/community/lists/users/2010/09/14324.php
>
> which, however, concluded without any direct solution.
>
> can anyone help me in identifying the source of the bug (code or MPI bug)?
>
> thanks
> Francesco
> ________________________________
> From: Francesco Salvadore <francescosalvadore_at_[hidden]>
> To: "users_at_[hidden]" <users_at_[hidden]>
> Sent: Saturday, October 8, 2011 10:06 AM
> Subject: [OMPI users] MPI_Waitany segfaults or (maybe) hangs
>
>
> Dear MPI users,
>
> I am struggling against the bad behaviour of a MPI code. These are the
> basic informations:
>
> a) fortran intel11 or intel 12 and OpenMPI 1.4.1 and 1.4.3 give the same
> problem. activating -traceback compiler option, I see the program stops
> at MPI_Waitany. MPI_Waitany waits for the completion of an array of
> MPI_IRecv: looping for the number of array components at the end all
> receives should be completed.
> The programs stops at unpredictable points (after 1 or 5 or 24 hours of
> computation). Sometimes I have sigsegv:
>
> mca_btl_openib.so  00002BA74D29D181  Unknown              Unknown  Unknown
> mca_btl_openib.so  00002BA74D29C6FF  Unknown              Unknown  Unknown
> mca_btl_openib.so  00002BA74D29C033  Unknown              Unknown  Unknown
> libopen-pal.so.0  00002BA74835C3E6  Unknown              Unknown  Unknown
> libmpi.so.0        00002BA747E485AD  Unknown              Unknown  Unknown
> libmpi.so.0        00002BA747E7857D  Unknown              Unknown  Unknown
> libmpi_f77.so.0    00002BA747C047C4  Unknown              Unknown  Unknown
> cosa.mpi          00000000004F856B  waitanymessages_        1292 
> parallelutils.f
> cosa.mpi          00000000004C8044  cutman_q_                2084  bc.f
> cosa.mpi          0000000000413369  smooth_                  2029  cosa.f
> cosa.mpi          0000000000410782  mg_                      810  cosa.f
> cosa.mpi          000000000040FB78  MAIN__                    537  cosa.f
> cosa.mpi          000000000040C1FC  Unknown              Unknown  Unknown
> libc.so.6          00002BA7490AE994  Unknown              Unknown  Unknown
> cosa.mpi          000000000040C109  Unknown              Unknown  Unknown
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 34 with PID 10335 on
> node neo251 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> Waitanymessages is just a wrapper of MPI_Waitany. Sometimes, the run
> stops writing anything on screen and I do not know what is happening
> (probably MPI_Waitany hangs). Before reaching segafault or hanging,
> results are always correct, as checked using the serial version of the
> code.
>
> b) The problem occurs only using openib (using TCP/IP it works) and only
> using more than one node on our main cluster . Trying many possibile
> workarounds, I found that running using:
>
> -mca btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 -mca
> btl_openib_flags 1
>
> the problems seems not to occur.
>
> I would be very thankful to anyone which can help me to make us sure
> there is no bug in the code and, anyway, to discover the reason of such
> a "dangerous" behaviour.
>
> I can give any further information if needed, and I apologize if the
> post is not enough clear or complete.
>
> regards,
> Francesco
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/