Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Doug Gregor (dgregor_at_[hidden])
Date: 2006-06-29 17:42:39


On Jun 29, 2006, at 5:23 PM, Graham E Fagg wrote:

> Hi Doug
> wow, looks like some messages are getting lost (or even delivered
> to the wrong peer on the same node.. ) Could you also try with:
>
> -mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca
> coll_tuned_bcast_algorithm <1,2,3,4,5,6>
>
> The values 1-6 control which topology/aglorithm are used internally..

The results are... very odd. With algorithms 1--5, everything seems
to be okay: I ran a couple trials of each and never had it hang.

When I use algorithm 6, I get:

[odin003.cs.indiana.edu:14174] *** An error occurred in MPI_Bcast
[odin005.cs.indiana.edu:10510] *** An error occurred in MPI_Bcast
Broadcasting integers from root 0...[odin004.cs.indiana.edu:11752]
*** An error occurred in MPI_Bcast
[odin003.cs.indiana.edu:14174] *** on communicator MPI_COMM_WORLD
[odin005.cs.indiana.edu:10510] *** on communicator MPI_COMM_WORLD
[odin005.cs.indiana.edu:10510] *** MPI_ERR_ARG: invalid argument of
some other kind
[odin005.cs.indiana.edu:10510] *** MPI_ERRORS_ARE_FATAL (goodbye)
[odin002.cs.indiana.edu:05866] *** An error occurred in MPI_Bcast
[odin004.cs.indiana.edu:11752] *** on communicator MPI_COMM_WORLD
[odin003.cs.indiana.edu:14174] *** MPI_ERR_ARG: invalid argument of
some other kind
[message repeated many times for the different processes]

Are there other settings I can tweak to try to find the algorithm
that it's deciding to use at run-time?

        Cheers,
        Doug