Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-08-29 18:05:59

On Aug 29, 2007, at 7:05 PM, Andrew Friedley wrote:

> $ mpirun -debug -np 2 -bynode -debug-daemons ./NPmpi
> ----------------------------------------------------------------------
> ----
> Internal error -- the orte_base_user_debugger MCA parameter was not
> able to
> be found. Please contact the Open RTE developers; this should not
> happen.
> ----------------------------------------------------------------------
> ----
> Grepping for that param in ompi_info shows:
> MCA orte: parameter "orte_base_user_debugger" (current value:
> "totalview @mpirun@ -a @mpirun_args@ : ddt -n @np@
> -start @executable@ @executable_argv@
> @single_app@ :
> fxp @mpirun@ -a @mpirun_args@")

This has been broken or a while. It's a long story to explain, but a
fix is on the way.

Until then you should be using the latest command "tv8 mpirun -a -np
2 -bynode `pwd`/NPmpi". The `pwd` is really important for some
reason, otherwise TotalView is unable to find the executable. The
problem is that the name of the process will be "./NPmpi" and
TotalView does not have access to the path where the executable was
launched (at least that's the reason I think).

Once you do this, you should be good to go.


> What's going on? I also tried running totalview directly, using a
> line
> like this:
> totalview mpirun -a -np 2 -bynode -debug-daemons ./NPmpi
> Totalview comes up and seems to be running debugging the mpirun
> process,
> with only one thread. Doesn't seem to be aware that this is an MPI
> job
> with other MPI processes.. any ideas?
> Andrew
> George Bosilca wrote:
>> The first step will be to figure out which version of the alltoall
>> you're using. I suppose you use the default parameters, and then the
>> decision function in the tuned component say it is using the linear
>> all to all. As the name state it, this means that every node will
>> post one receive from any other node and then will start sending to
>> every other node the respective fragment. This will lead to a lot of
>> outstanding sends and receives. I doubt that the receive can cause a
>> problem, so I expect the problem is coming from the send side.
>> Do you have TotalView installed on your odin ? If yes there is a
>> simple way to see how many sends are pending and where ... That might
>> pinpoint [at least] the process where you should look to see what'
>> wrong.
>> george.
>> On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote:
>>> I'm having a problem with the UD BTL and hoping someone might have
>>> some
>>> input to help solve it.
>>> What I'm seeing is hangs when running alltoall benchmarks with
>>> nbcbench
>>> or an LLNL program called mpiBench -- both hang exactly the same
>>> way.
>>> With the code on the trunk running nbcbench on IU's odin using 32
>>> nodes
>>> and a command line like this:
>>> mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p
>>> 128-128
>>> -s 1-262144
>>> hangs consistently when testing 256-byte messages. There are two
>>> things
>>> I can do to make the hang go away until running at larger scale.
>>> First
>>> is to increase the 'btl_ofud_sd_num' MCA param from its default
>>> value of
>>> 128. This allows you to run with more procs/nodes before hitting
>>> the
>>> hang, but AFAICT doesn't fix the actual problem. What this
>>> parameter
>>> does is control the maximum number of outstanding send WQEs
>>> posted at
>>> the IB level -- when the limit is reached, frags are queued on an
>>> opal_list_t and later sent by progress as IB sends complete.
>>> The other way I've found is to play games with calling
>>> mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send
>>> (). In
>>> fact I replaced the CHECK_FRAG_QUEUES() macro used around
>>> btl_ofud_endpoint.c:77 with a version that loops on progress until a
>>> send WQE slot is available (as opposed to queueing). Same result
>>> -- I
>>> can run at larger scale, but still hit the hang eventually.
>>> It appears that when the job hangs, progress is being polled very
>>> quickly, and after spinning for a while there are no outstanding
>>> send
>>> WQEs or queued sends in the BTL. I'm not sure where further up
>>> things
>>> are spinning/blocking, as I can't produce the hang at less than 32
>>> nodes
>>> / 128 procs and don't have a good way of debugging that (suggestions
>>> appreciated).
>>> Furthermore, both ob1 and dr PMLs result in the same behavior,
>>> except
>>> that DR eventually trips a watchdog timeout, fails the BTL, and
>>> terminates the job.
>>> Other collectives such as allreduce and allgather do not hang --
>>> only
>>> alltoall. I can also reproduce the hang on LLNL's Atlas machine.
>>> Can anyone else reproduce this (Torsten might have to make a copy of
>>> nbcbench available)? Anyone have any ideas as to what's wrong?
>>> Andrew
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]