Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC MPI 2.2 Dist_graph addition
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-07-01 09:51:39


George --

All 4 tests fail for me -- can you have a look?

-----
[6:50] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_1
[mpi002:5304] *** An error occurred in MPI_Dist_graph_create
[mpi002:5304] *** reported by process [46910457249793,46909632806913]
[mpi002:5304] *** on communicator MPI_COMM_WORLD
[mpi002:5304] *** MPI_ERR_OTHER: known error not in list
[mpi002:5304] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mpi002:5304] *** and potentially your MPI job)
[savbu-usnic-a:24610] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[savbu-usnic-a:24610] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[6:50] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_2
[mpi002:5316] *** An error occurred in MPI_Dist_graph_create_adjacent
[mpi002:5316] *** reported by process [46910457053185,46909632806913]
[mpi002:5316] *** on communicator MPI_COMM_WORLD
[mpi002:5316] *** MPI_ERR_OTHER: known error not in list
[mpi002:5316] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mpi002:5316] *** and potentially your MPI job)
[savbu-usnic-a:24615] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[savbu-usnic-a:24615] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[6:51] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_3
[mpi001:5338] *** An error occurred in MPI_Dist_graph_create_adjacent
[mpi001:5338] *** reported by process [46910469242881,46909632806916]
[mpi001:5338] *** on communicator MPI_COMM_WORLD
[mpi001:5338] *** MPI_ERR_OTHER: known error not in list
[mpi001:5338] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mpi001:5338] *** and potentially your MPI job)
[savbu-usnic-a:24797] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[savbu-usnic-a:24797] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[6:51] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_4
[mpi001:5351] *** An error occurred in MPI_Dist_graph_create
[mpi001:5351] *** reported by process [46910442110977,46909632806912]
[mpi001:5351] *** on communicator MPI_COMM_WORLD
[mpi001:5351] *** MPI_ERR_OTHER: known error not in list
[mpi001:5351] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mpi001:5351] *** and potentially your MPI job)
[savbu-usnic-a:24891] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[savbu-usnic-a:24891] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[6:51] savbu-usnic-a:~/s/o/dist_graph ❯❯❯
-----

On Jul 1, 2013, at 8:41 AM, George Bosilca <bosilca_at_[hidden]> wrote:

> The patch has been pushed into the trunk in r28687.
>
> George.
>
>
> On Jul 1, 2013, at 13:55 , George Bosilca <bosilca_at_[hidden]> wrote:
>
>> Guys,
>>
>> Thanks for the patch and for the tests. All these changes/cleanups are correct, I have incorporate them all in the patch. Please find below the new patch.
>>
>> As the deadline for the RFC is today, I'll move forward and push the changes into the trunk, and if there are still issues we can work them out directly in the trunk.
>>
>> Thanks,
>> George.
>>
>> PS: I will push your tests in our tests base as well.
>>
>>
>> On Jul 1, 2013, at 06:39 , "Kawashima, Takahiro" <t-kawashima_at_[hidden]> wrote:
>>
>>> George,
>>>
>>> My colleague was working on your ompi-topo bitbucket repository
>>> but it was not completed. But he found bugs in your patch attached
>>> in your previous mail and created the fixing patch. See the attached
>>> patch, which is a patch against Open MPI trunk + your patch.
>>>
>>> His test programs are also attached. test_1 and test_2 can run
>>> with nprocs=5, and test_3 and test_4 can run with nprocs>=3.
>>>
>>> Though I'm not sure about the contents of the patch and the test
>>> programs, I can ask him if you have any questions.
>>>
>>> Regards,
>>> Takahiro Kawashima,
>>> MPI development team,
>>> Fujitsu
>>>
>>>> WHAT: Support for MPI 2.2 dist_graph
>>>>
>>>> WHY: To become [almost entierly] MPI 2.2 compliant
>>>>
>>>> WHEN: Monday July 1st
>>>>
>>>> As discussed during the last phone call, a missing functionality of the MPI 2.2 standard (the distributed graph topology) is ready for prime-time. The attached patch provide a minimal version (no components supporting reordering), that will complete the topology support in Open MPI.
>>>>
>>>> It is somehow a major change compared with what we had before and it reshape the way we deal with topologies completely. Where our topologies were mainly storage components (they were not capable of creating the new communicator as an example), the new version is built around a [possibly] common representation (in mca/topo/topo.h), but the functions to attach and retrieve the topological information are specific to each component. As a result the ompi_create_cart and ompi_create_graph functions become useless and have been removed.
>>>>
>>>> In addition to adding the internal infrastructure to manage the topology information, it updates the MPI interface, and the debuggers support and provides all Fortran interfaces. From a correctness point of view it passes all the tests we have in ompi-tests for the cart and graph topology, and some tests/applications for the dist_graph interface.
>>>>
>>>> I don't think there is a need for a long wait on this one so I would like to propose a short deadline, a week from now on Monday July 1st. A patch based on Open MPI trunk r28670 is attached below.
>>> <dist-graph-fix.patch><dist-graph-test.tar.gz>_______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/