Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Waitall never returns
From: Ross Boylan (ross_at_[hidden])
Date: 2014-04-10 14:48:51


On 4/9/2014 5:26 PM, Ross Boylan wrote:
> On Fri, 2014-04-04 at 22:40 -0400, George Bosilca wrote:
>> Ross,
>>
>> I’m not familiar with the R implementation you are using, but bear with me and I will explain how you can all Open MPI about the list of all pending requests on a process. Disclosure: This is Open MPI deep voodoo, an extreme way to debug applications that might save you quite some time.
>>
>> The only thing you need is the communicator you posted your requests into, or at least a pointer to it. Then you attach to your process (or processes) with your preferred debugger and call
>> mca_pml_ob1_dump(struct ompi_communicator_t* comm, int verbose)
>>
>> With gdb this should look like “call mca_pml_ob1_dump(my_comm, 1)”. This will dump human readable information about all the requests pending on a communicator (both sends and receives).
>>
> Thank you so much for the tip. After inserting a barrier failed to help
> I decided to try this. After much messing around (details below):
> BTL SM 0x7f615dea9660 endpoint 0x3c15d90 [smp_rank 5] [peer_rank 0]
> BTL SM 0x7f615dea9660 endpoint 0x3b729e0 [smp_rank 5] [peer_rank 1]
> BTL SM 0x7f615dea9660 endpoint 0x3b72ad0 [smp_rank 5] [peer_rank 2]
> BTL SM 0x7f615dea9660 endpoint 0x3c06e60 [smp_rank 5] [peer_rank 3]
> BTL SM 0x7f615dea9660 endpoint 0x3c06f50 [smp_rank 5] [peer_rank 4]
> [n2:10664] [Rank 0]
> [n2:10664] [Rank 1]
> [n2:10664] [Rank 2]
> [n2:10664] [Rank 3]
> [n2:10664] [Rank 4]
> [n2:10664] [Rank 5]
> [n2:10664] [Rank 6]
> [n2:10664] [Rank 7]
> [n2:10664] [Rank 8]
> [n2:10664] [Rank 9]
> [n2:10664] [Rank 10]
> [n2:10664] [Rank 11]
> [n2:10664] [Rank 12]
> [n2:10664] [Rank 13]
After tracing through the code, things seem odder, though different.
First, the output above is out of sequence.
Second, I think the BTLs are transport mechanisms, or something similar,
not actual messages.
If there were messages, they would be listed underneath. There aren't any.

So I think this shows there is nothing to wait on, as I suspected.
Except I seem to be missing info for the remote ranks.

Is there any way a request can be completed absent a Wait or Test on the
request?

Third, I'm seeing BTL's listed for one rank with which I do communicate
(0), and 4 ranks I do not communicate with. Ranks 0-5 are local and the
rest are remote. rank 5 does communicate with all the remote nodes, but
absolutely nothing is listed for them. When I trace from
  bml_btl->btl->btl_dump(bml_btl->btl, bml_btl->btl_endpoint, verbose)
in mca_pml_ob1_dump I get to
I see (gdb in emacs)

void mca_btl_base_dump(
     struct mca_btl_base_module_t* btl,
     struct mca_btl_base_endpoint_t* endpoint,
     int verbose)
{
}

The function is a no-op. Which sort of explains why I'm seeing nothing
for those ranks, but doesn't seem quite right.
The pending messages are likely to be to the remote ranks.

Ross

In sequence output:
[n2:11695] [Rank 0]
BTL SM 0x7fa37e1b4660 endpoint 0x31a7d70 [smp_rank 5] [peer_rank 0]
[n2:11695] [Rank 1]
BTL SM 0x7fa37e1b4660 endpoint 0x31049e0 [smp_rank 5] [peer_rank 1]
[n2:11695] [Rank 2]
BTL SM 0x7fa37e1b4660 endpoint 0x3104ad0 [smp_rank 5] [peer_rank 2]
[n2:11695] [Rank 3]
BTL SM 0x7fa37e1b4660 endpoint 0x3198e60 [smp_rank 5] [peer_rank 3]
[n2:11695] [Rank 4]
BTL SM 0x7fa37e1b4660 endpoint 0x3198f50 [smp_rank 5] [peer_rank 4]
[n2:11695] [Rank 5]
[n2:11695] [Rank 6]
[n2:11695] [Rank 7]
[n2:11695] [Rank 8]
[n2:11695] [Rank 9]
[n2:11695] [Rank 10]
[n2:11695] [Rank 11]
[n2:11695] [Rank 12]
[n2:11695] [Rank 13]

>
> Not entirely human readable if the human is me!
> Does smp_rank (and peer_rank) = what I would get from MPI_Comm_rank? I
> hope so, because I was aiming for rank 5.
> How do I know if I'm sending or receiving? They should all be sends.
>
> What are all the lines like
> [n2:10664] [Rank 7]?
>
> What this seems to show is very odd.
> First, my code thinks there are 3 outstanding Isends. Does this report
> include requests that have become inactive (because complete)?
>
> Second, during normal operations rank 5 does not talk to ranks 1-4.
> I did put an MPI_Barrier in just before shutdown, but the trace
> information indicates rank 5 never gets to that step.
>
> To provide fuller context, and maybe some clues to others who attempt
> this, I first tried this with my non-debug enabled libraries. I guessed
> that the ranks were in the same order as the process numbers and invoked
> gdb on my R executable giving the process number (once the system
> reached its stuck state).
>
> Accessing the communicator was tricky, via the comm variable defined in
> the Rmpi library. So overall, the executable for R starts and loads the
> Rmpi library. The latter in turn loads and references the MPI library.
> The communicators are defined in the Rmpi library with MPI_Comm *comm,
> and then one I need is comm[1].
>
> When I tried to reference it I got an error that there was no debugging
> info. I reconfigured MPI with --enable-debug and rebuilt it (make
> clean all install). Then I launched everything again; I did not rebuild
> Rmpi against the debug libraries, though I installed the debug libraries
> in the old location for the regular ones.
>
> I still had problems:
> (gdb) p comm[1]
> cannot subscript something of type `<data variable, no debug info>'
> The error message I got before making MPI with debug was a bit different
> and stronger,
>
> I realized that comm was a symbol in Rmpi which I had not built with
> debug symbols. Since MPI_Comm should now be understood by the debugger
> I tried and explicit cast, which worked:
> call mca_pml_ob1_dump(((MPI_Comm *) comm)[1], 1)
>
> So I'm not entirely sure if the build of a debug version of MPI was
> necessary.
>
> Ross
>> If you are right, all processes will report NONE, and the bug is somewhere in-between your application and the MPI library. Otherwise, you might have some not-yet-completed requests pending…
>>
>> George.
>>
>>
>> On Apr 4, 2014, at 22:20 , Ross Boylan <ross_at_[hidden]> wrote:
>>
>>> On 4/4/2014 6:01 PM, Ralph Castain wrote:
>>>> It sounds like you don't have a balance between sends and recvs somewhere - i.e., some apps send messages, but the intended recipient isn't issuing a recv and waiting until the message has been received before exiting. If the recipient leaves before the isend completes, then the isend will never complete and the waitall will not return.
>>> I'm pretty sure the sends complete because I wait on something that can only be computed after the sends complete, and I know I have that result.
>>>
>>> My current theory is that my modifications to Rmpi are not properly tracking all completed messages, resulting in it thinking there are outstanding messages (and passing a positive count to the C-level MPI_Waitall with associated garbagey arrays). But I haven't isolated the problem.
>>>
>>> Ross
>>>>
>>>> On Apr 4, 2014, at 5:20 PM, Ross Boylan <ross_at_[hidden]> wrote:
>>>>
>>>>> During shutdown of my application the processes issue a waitall, since they have done some Isends. A couple of them never return from that call.
>>>>>
>>>>> Could this be the result of some of the processes already being shutdown (the processes with the problem were late in the shutdown sequence)? If so, what is the recommended solution? A barrier?
>>>>>
>>>>> The shutdown proceeds in stages, but the processes in question are not told to shutdown until all the messages they have sent have been received. So there shouldn't be any outstanding messages from them.
>>>>>
>>>>> My reading of the manual is that Waitall with a count of 0 should return immediately, not hang. Is that correct?
>>>>>
>>>>> Running under R with openmpi 1.7.4.
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>