This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
while debugging the dynamic/intercomm_create from the ibm test suite, i
found something odd.
i ran *without* any batch manager on a VM (one socket and four cpus)
mpirun -np 1 ./dynamic/intercomm_create
it hangs by default
it works with --mca coll ^ml
- task 0 spawns task 1
- task 0 spawns task 2
- a communicator is created for the 3 tasks via MPI_Intercomm_create()
MPI_Intercomm_create() calls ompi_comm_get_rprocs() which calls
then, on task 1, ompi_proc_set_locality() calls
opal_dstore.fetch(opal_dstore_internal, "task 2"->proc_name, ...) which
fails and this is OK
opal_dstore_fetch(opal_dstore_nonpeer, "task 2"->proc_name, ...) which
fails and this is *not* OK
/* on task 2, the first fetch for "task 1" fails but the second success */
my analysis is that when task 2 was created, it updated its
opal_dstore_nonpeer with info from "task 1" which was previously spawned by
when task 1 was spawned, task 2 did not exist yet and hence
opal_dstore_nonpeer contains no reference to task 2.
but when task 2 was spawned, opal_dstore_nonpeer of task 1 has not been
updated, hence the failure
(on task 1, proc_flags of task 2 has incorrect locality, this likely
confuses coll ml and hang the test)
should task1 have received new information when task 2 was spawned ?
shoud task2 have sent information to task1 when it was spawned ?
should task1 have (tried to) get fresh information before invoking
incidentally, i found ompi_proc_set_locality calls opal_dstore.store with
identifier &proc (the argument is &proc->proc_name everywhere else, so this
is likely a bug/typo. the attached patch fixes this.
Thanks in advance for your feedback,