Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Spawn error: Data unpack would read past end of buffer" (-26) instead of "Success"
From: Simone Pellegrini (spellegrini_at_[hidden])
Date: 2011-09-06 15:20:27


On 09/06/2011 04:58 PM, Ralph Castain wrote:
> On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote:
>
>> On 09/06/2011 02:57 PM, Ralph Castain wrote:
>>> Hi Simone
>>>
>>> Just to clarify: is your application threaded? Could you please send the OMPI configure cmd you used?
>> yes, it is threaded. There are basically 3 threads, 1 for the outgoing messages (MPI_send), 1 for incoming messages (MPI_Iprobe / MPI_Recv) and one spawning.
>>
>> I am not sure what you mean with OMPI configure cmd I used... I simply do mpirun --np 1 ./executable
> How was OMPI configured when it was installed? If you didn't install it, then provide the output of ompi_info - it will tell us.
[@arch-moto tasksys]$ ompi_info
                  Package: Open MPI nobody_at_alderaan Distribution
                 Open MPI: 1.5.3
    Open MPI SVN revision: r24532
    Open MPI release date: Mar 16, 2011
                 Open RTE: 1.5.3
    Open RTE SVN revision: r24532
    Open RTE release date: Mar 16, 2011
                     OPAL: 1.5.3
        OPAL SVN revision: r24532
        OPAL release date: Mar 16, 2011
             Ident string: 1.5.3
                   Prefix: /usr
  Configured architecture: x86_64-unknown-linux-gnu
           Configure host: alderaan
            Configured by: nobody
            Configured on: Thu Jul 7 13:21:35 UTC 2011
           Configure host: alderaan
                 Built by: nobody
                 Built on: Thu Jul 7 13:27:08 UTC 2011
               Built host: alderaan
               C bindings: yes
             C++ bindings: yes
       Fortran77 bindings: yes (all)
       Fortran90 bindings: yes
  Fortran90 bindings size: small
               C compiler: gcc
      C compiler absolute: /usr/bin/gcc
   C compiler family name: GNU
       C compiler version: 4.6.1
             C++ compiler: g++
    C++ compiler absolute: /usr/bin/g++
       Fortran77 compiler: gfortran
   Fortran77 compiler abs: /usr/bin/gfortran
       Fortran90 compiler: /usr/bin/gfortran
   Fortran90 compiler abs:
              C profiling: yes
            C++ profiling: yes
      Fortran77 profiling: yes
      Fortran90 profiling: yes
           C++ exceptions: no
           Thread support: posix (mpi: yes, progress: no)
            Sparse Groups: no
   Internal debug support: yes
   MPI interface warnings: no
      MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
          libltdl support: yes
    Heterogeneous support: no
  mpirun default --prefix: no
          MPI I/O support: yes
        MPI_WTIME support: gettimeofday
      Symbol vis. support: yes
           MPI extensions: affinity example
    FT Checkpoint support: no (checkpoint thread: no)
   MPI_MAX_PROCESSOR_NAME: 256
     MPI_MAX_ERROR_STRING: 256
      MPI_MAX_OBJECT_NAME: 64
         MPI_MAX_INFO_KEY: 36
         MPI_MAX_INFO_VAL: 256
        MPI_MAX_PORT_NAME: 1024
   MPI_MAX_DATAREP_STRING: 128
            MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.5.3)
           MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.5.3)
               MCA memory: linux (MCA v2.0, API v2.0, Component v1.5.3)
            MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.5.3)
                MCA carto: auto_detect (MCA v2.0, API v2.0, Component
v1.5.3)
                MCA carto: file (MCA v2.0, API v2.0, Component v1.5.3)
            MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.5.3)
                MCA timer: linux (MCA v2.0, API v2.0, Component v1.5.3)
          MCA installdirs: env (MCA v2.0, API v2.0, Component v1.5.3)
          MCA installdirs: config (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA dpm: orte (MCA v2.0, API v2.0, Component v1.5.3)
               MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.5.3)
            MCA allocator: basic (MCA v2.0, API v2.0, Component v1.5.3)
            MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: basic (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: inter (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: self (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: sm (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: sync (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: tuned (MCA v2.0, API v2.0, Component v1.5.3)
                   MCA io: romio (MCA v2.0, API v2.0, Component v1.5.3)
                MCA mpool: fake (MCA v2.0, API v2.0, Component v1.5.3)
                MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.5.3)
                MCA mpool: sm (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: bfo (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: csum (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: v (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA bml: r2 (MCA v2.0, API v2.0, Component v1.5.3)
               MCA rcache: vma (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA btl: self (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA btl: sm (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA btl: tcp (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA topo: unity (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA osc: rdma (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA iof: hnp (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA iof: orted (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA iof: tool (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA oob: tcp (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA odls: default (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ras: cm (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: load_balance (MCA v2.0, API v2.0, Component
v1.5.3)
                MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: round_robin (MCA v2.0, API v2.0, Component
v1.5.3)
                MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA rml: oob (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: binomial (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: cm (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: direct (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: linear (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: radix (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: slave (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA plm: rsh (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA plm: rshd (MCA v2.0, API v2.0, Component v1.5.3)
                MCA filem: rsh (MCA v2.0, API v2.0, Component v1.5.3)
               MCA errmgr: default (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: env (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: hnp (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: singleton (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: slave (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: tool (MCA v2.0, API v2.0, Component v1.5.3)
              MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.5.3)
              MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.5.3)
              MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.5.3)
             MCA notifier: command (MCA v2.0, API v1.0, Component v1.5.3)
             MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.5.3)

>
>>> Adding the debug flags just changes the race condition. Interestingly, those values only impact the behavior of mpirun, so it looks like the race condition is occurring there.
>> The problem is that the error is totally nondeterministic. Sometimes happens, others not but the error message gives me no clue where the error is coming from. Is is a problem of my code or internal MPI?
> Can't tell, but it is likely an impact of threading. Race conditions within threaded environments are common, and OMPI isn't particularly thread safe, especially when it comes to comm_spawn.
>
>> cheers, Simone
>>>
>>> On Sep 6, 2011, at 3:01 AM, Simone Pellegrini wrote:
>>>
>>>> Dear all,
>>>> I am developing an MPI application which uses heavily MPI_Spawn. Usually everything works fine for the first hundred spawn but after a while the application exist with a curious message:
>>>>
>>>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/grpcomm_base_modex.c at line 349
>>>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file grpcomm_bad_module.c at line 518
>>>> --------------------------------------------------------------------------
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or environment
>>>> problems. This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>> ompi_proc_set_arch failed
>>>> --> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
>>>> --------------------------------------------------------------------------
>>>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
>>>> *** This is disallowed by the MPI standard.
>>>> *** Your MPI job will now abort.
>>>> [arch-top:27712] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
>>>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/grpcomm_base_modex.c at line 349
>>>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file grpcomm_bad_module.c at line 518
>>>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
>>>> *** This is disallowed by the MPI standard.
>>>> *** Your MPI job will now abort.
>>>> [arch-top:27714] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
>>>> [arch-top:27226] 1 more process has sent help message help-mpi-runtime / mpi_init:startup:internal-failure
>>>> [arch-top:27226] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>>>>
>>>> Also using MPI_init instead of MPI_Init_thread does not help, the same error occurs.
>>>>
>>>> Strangely the error does not occur if I run the code enabling debug in (-mca plm_base_verbose 5 -mca rmaps_base_verbose 5).
>>>>
>>>> I am using OpenMPI 1.5.3
>>>>
>>>> cheers, Simone
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users