Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Kluskens (michael.kluskens_at_[hidden])
Date: 2006-03-07 16:39:28


On Mar 7, 2006, at 3:23 PM, Michael Kluskens wrote:

> Per the mpi_comm_spawn issues with the 1.0.x releases I started using
> 1.1r9212, with my sample code I'm getting a messages of
>
> [-:13327] mca: base: component_find: unable to open: dlopen(/usr/
> local/lib/openmpi/mca_pml_teg.so, 9): Symbol not found:
> _mca_ptl_base_recv_request_t_class
> Referenced from: /usr/local/lib/openmpi/mca_pml_teg.so
> Expected in: flat namespace
> (ignored)

I have determined that the above error/warning is caused by
installing opempi1.1r9212 on a machine were openmpi1.0.1 was
previously installed. I had to manually delete the files in /usr/
local/lib/openmpi and then reinstall. This implies an error with
with the 1.1 install script.

The following errors/warnings also exist when running my spawn test
on a clean installation of r9212.

> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> soh_base_get_proc_soh.c at line 100
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> oob_base_xcast.c at line 108
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> rmgr_base_stage_gate.c at line 276
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> soh_base_get_proc_soh.c at line 100
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> oob_base_xcast.c at line 108
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> rmgr_base_stage_gate.c at line 276
>
> OS X 10.4.5 with g95 from current fink install for FC & F77. Running
> on a single machine and launching a single spawned subprocess as a
> test case for now. Also on Debian Sarge on Operton built using "./
> configure --with-gnu-ld F77=pgf77 FFLAGS=-fastsse FC=pgf90 FCFLAGS=-
> fastsse" with PG 6.1.
>
> Are these diagnostic messages of errors in OpenMPI 1.1r9212 or
> related to errors in my test code?
>
> Is this information helpful for development purposes?
>
> Michael.
>
> On Mar 4, 2006, at 9:29 AM, Jeff Squyres wrote:
>
>> Michael --
>>
>> Sorry for the delay in replying.
>>
>> Many thanks for your report! You are exactly right -- our types are
>> wrong and will not match in the F90 bindings. I have committed a fix
>> to the trunk for this (it involved changing some types in mpif.h and
>> adding another interface function for MPI_COMM_SPAWN_MULTIPLE so that
>> MPI_ARGVS_NULL could be unambiguously matched).
>>
>> It was also just pointed out to us on this list the other day that we
>> are missing all the places where MPI choice buffers could be of type
>> CHARACTER (e.g., MPI_SEND). We're working on fixing that -- it's
>> just a bunch of menial labor to fix.
>>
>> I'm hesitant to put these fixes in the 1.0.x series simply because
>> we're trying to finish that series and advance towards 1.1. Would
>> you be amenable to using a 1.1.x snapshot? My commit should show up
>> in any 1.1 snapshot >= r9198.
>>
>>
>> On Mar 1, 2006, at 12:30 PM, Michael Kluskens wrote:
>>
>>>
>>> On Mar 1, 2006, at 9:56 AM, George Bosilca wrote:
>>>
>>>> Now I look into this problem more and your right it's a missing
>>>> interface. Somehow, it didn't get compiled.
>>>
>>> From "openmpi-1.0.1/ompi/mpi/f90/mpi-f90-interfaces.h" the
>>> interface
>>> says:
>>>
>>> subroutine MPI_Comm_spawn(command, argv, maxprocs, info, root, &
>>> comm, intercomm, array_of_errcodes, ierr)
>>> use mpi_kinds
>>> character(len=*), intent(in) :: command
>>> character(len=*), dimension(*), intent(in) :: argv
>>> integer, intent(in) :: maxprocs
>>> integer, intent(in) :: info
>>> integer, intent(in) :: root
>>> integer, intent(in) :: comm
>>> integer, intent(out) :: intercomm
>>> integer, dimension(*), intent(out) :: array_of_errcodes
>>> integer, intent(out) :: ierr
>>> end subroutine MPI_Comm_spawn
>>>
>>> My call is (mostly from the Using MPI-2 book):
>>> call MPI_Comm_spawn('subprocess', MPI_ARGV_NULL, universe_size-1,
>>> MPI_INFO_NULL, 0, &
>>> MPI_COMM_WORLD, slavecomm, MPI_ERRCODES_IGNORE, ierr )
>>>
>>> looking at "mpif.h" included by mpi_kinds.f90:
>>>
>>> double complex MPI_ARGV_NULL
>>> integer MPI_INFO_NULL
>>> integer MPI_COMM_WORLD
>>> double complex MPI_ERRCODES_IGNORE
>>>
>>> What I don't understand how the "double complex" MPI_ARGV_NULL could
>>> work with the "character(len=*), dimension(*), intent(in) :: argv"
>>> interface or how the "double complex" MPI_ERRCODES_IGNORE could work
>>> with the "integer, dimension(*), intent(out) :: array_of_errcodes"
>>> interface.
>>>
>>> I have the following for my variables:
>>>
>>> integer :: ierr,slavecomm
>>> integer (kind=MPI_ADDRESS_KIND) :: universe_size
>>>
>>> My usage of MPI_ADDRESS_KIND and MPI_Comm_spawn is based on pages
>>> 236 and 244 of "Using MPI-2"
>>>
>>> I'd to resolve the specific error involving the f90 interfaces so I
>>> can continue to "USE MPI" in order to check my interface errors
>>> quickly as I move forward on my project.
>>>
>>> Michael
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> {+} Jeff Squyres
>> {+} The Open MPI Project
>> {+} http://www.open-mpi.org/
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>