Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Michael Kluskens (michael.kluskens_at_[hidden])
Date: 2006-03-07 16:39:28


On Mar 7, 2006, at 3:23 PM, Michael Kluskens wrote:

> Per the mpi_comm_spawn issues with the 1.0.x releases I started using
> 1.1r9212, with my sample code I'm getting a messages of
>
> [-:13327] mca: base: component_find: unable to open: dlopen(/usr/
> local/lib/openmpi/mca_pml_teg.so, 9): Symbol not found:
> _mca_ptl_base_recv_request_t_class
> Referenced from: /usr/local/lib/openmpi/mca_pml_teg.so
> Expected in: flat namespace
> (ignored)

I have determined that the above error/warning is caused by
installing opempi1.1r9212 on a machine were openmpi1.0.1 was
previously installed. I had to manually delete the files in /usr/
local/lib/openmpi and then reinstall. This implies an error with
with the 1.1 install script.

The following errors/warnings also exist when running my spawn test
on a clean installation of r9212.

> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> soh_base_get_proc_soh.c at line 100
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> oob_base_xcast.c at line 108
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> rmgr_base_stage_gate.c at line 276
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> soh_base_get_proc_soh.c at line 100
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> oob_base_xcast.c at line 108
> [-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
> rmgr_base_stage_gate.c at line 276
>
> OS X 10.4.5 with g95 from current fink install for FC & F77. Running
> on a single machine and launching a single spawned subprocess as a
> test case for now. Also on Debian Sarge on Operton built using "./
> configure --with-gnu-ld F77=pgf77 FFLAGS=-fastsse FC=pgf90 FCFLAGS=-
> fastsse" with PG 6.1.
>
> Are these diagnostic messages of errors in OpenMPI 1.1r9212 or
> related to errors in my test code?
>
> Is this information helpful for development purposes?
>
> Michael.
>
> On Mar 4, 2006, at 9:29 AM, Jeff Squyres wrote:
>
>> Michael --
>>
>> Sorry for the delay in replying.
>>
>> Many thanks for your report! You are exactly right -- our types are
>> wrong and will not match in the F90 bindings. I have committed a fix
>> to the trunk for this (it involved changing some types in mpif.h and
>> adding another interface function for MPI_COMM_SPAWN_MULTIPLE so that
>> MPI_ARGVS_NULL could be unambiguously matched).
>>
>> It was also just pointed out to us on this list the other day that we
>> are missing all the places where MPI choice buffers could be of type
>> CHARACTER (e.g., MPI_SEND). We're working on fixing that -- it's
>> just a bunch of menial labor to fix.
>>
>> I'm hesitant to put these fixes in the 1.0.x series simply because
>> we're trying to finish that series and advance towards 1.1. Would
>> you be amenable to using a 1.1.x snapshot? My commit should show up
>> in any 1.1 snapshot >= r9198.
>>
>>
>> On Mar 1, 2006, at 12:30 PM, Michael Kluskens wrote:
>>
>>>
>>> On Mar 1, 2006, at 9:56 AM, George Bosilca wrote:
>>>
>>>> Now I look into this problem more and your right it's a missing
>>>> interface. Somehow, it didn't get compiled.
>>>
>>> From "openmpi-1.0.1/ompi/mpi/f90/mpi-f90-interfaces.h" the
>>> interface
>>> says:
>>>
>>> subroutine MPI_Comm_spawn(command, argv, maxprocs, info, root, &
>>> comm, intercomm, array_of_errcodes, ierr)
>>> use mpi_kinds
>>> character(len=*), intent(in) :: command
>>> character(len=*), dimension(*), intent(in) :: argv
>>> integer, intent(in) :: maxprocs
>>> integer, intent(in) :: info
>>> integer, intent(in) :: root
>>> integer, intent(in) :: comm
>>> integer, intent(out) :: intercomm
>>> integer, dimension(*), intent(out) :: array_of_errcodes
>>> integer, intent(out) :: ierr
>>> end subroutine MPI_Comm_spawn
>>>
>>> My call is (mostly from the Using MPI-2 book):
>>> call MPI_Comm_spawn('subprocess', MPI_ARGV_NULL, universe_size-1,
>>> MPI_INFO_NULL, 0, &
>>> MPI_COMM_WORLD, slavecomm, MPI_ERRCODES_IGNORE, ierr )
>>>
>>> looking at "mpif.h" included by mpi_kinds.f90:
>>>
>>> double complex MPI_ARGV_NULL
>>> integer MPI_INFO_NULL
>>> integer MPI_COMM_WORLD
>>> double complex MPI_ERRCODES_IGNORE
>>>
>>> What I don't understand how the "double complex" MPI_ARGV_NULL could
>>> work with the "character(len=*), dimension(*), intent(in) :: argv"
>>> interface or how the "double complex" MPI_ERRCODES_IGNORE could work
>>> with the "integer, dimension(*), intent(out) :: array_of_errcodes"
>>> interface.
>>>
>>> I have the following for my variables:
>>>
>>> integer :: ierr,slavecomm
>>> integer (kind=MPI_ADDRESS_KIND) :: universe_size
>>>
>>> My usage of MPI_ADDRESS_KIND and MPI_Comm_spawn is based on pages
>>> 236 and 244 of "Using MPI-2"
>>>
>>> I'd to resolve the specific error involving the f90 interfaces so I
>>> can continue to "USE MPI" in order to check my interface errors
>>> quickly as I move forward on my project.
>>>
>>> Michael
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> {+} Jeff Squyres
>> {+} The Open MPI Project
>> {+} http://www.open-mpi.org/
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>