Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_Spawn error: Data unpack would read past end of buffer" (-26) instead of "Success"
From: Simone Pellegrini (spellegrini_at_[hidden])
Date: 2011-09-06 15:20:27


On 09/06/2011 04:58 PM, Ralph Castain wrote:
> On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote:
>
>> On 09/06/2011 02:57 PM, Ralph Castain wrote:
>>> Hi Simone
>>>
>>> Just to clarify: is your application threaded? Could you please send the OMPI configure cmd you used?
>> yes, it is threaded. There are basically 3 threads, 1 for the outgoing messages (MPI_send), 1 for incoming messages (MPI_Iprobe / MPI_Recv) and one spawning.
>>
>> I am not sure what you mean with OMPI configure cmd I used... I simply do mpirun --np 1 ./executable
> How was OMPI configured when it was installed? If you didn't install it, then provide the output of ompi_info - it will tell us.
[@arch-moto tasksys]$ ompi_info
                  Package: Open MPI nobody_at_alderaan Distribution
                 Open MPI: 1.5.3
    Open MPI SVN revision: r24532
    Open MPI release date: Mar 16, 2011
                 Open RTE: 1.5.3
    Open RTE SVN revision: r24532
    Open RTE release date: Mar 16, 2011
                     OPAL: 1.5.3
        OPAL SVN revision: r24532
        OPAL release date: Mar 16, 2011
             Ident string: 1.5.3
                   Prefix: /usr
  Configured architecture: x86_64-unknown-linux-gnu
           Configure host: alderaan
            Configured by: nobody
            Configured on: Thu Jul 7 13:21:35 UTC 2011
           Configure host: alderaan
                 Built by: nobody
                 Built on: Thu Jul 7 13:27:08 UTC 2011
               Built host: alderaan
               C bindings: yes
             C++ bindings: yes
       Fortran77 bindings: yes (all)
       Fortran90 bindings: yes
  Fortran90 bindings size: small
               C compiler: gcc
      C compiler absolute: /usr/bin/gcc
   C compiler family name: GNU
       C compiler version: 4.6.1
             C++ compiler: g++
    C++ compiler absolute: /usr/bin/g++
       Fortran77 compiler: gfortran
   Fortran77 compiler abs: /usr/bin/gfortran
       Fortran90 compiler: /usr/bin/gfortran
   Fortran90 compiler abs:
              C profiling: yes
            C++ profiling: yes
      Fortran77 profiling: yes
      Fortran90 profiling: yes
           C++ exceptions: no
           Thread support: posix (mpi: yes, progress: no)
            Sparse Groups: no
   Internal debug support: yes
   MPI interface warnings: no
      MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
          libltdl support: yes
    Heterogeneous support: no
  mpirun default --prefix: no
          MPI I/O support: yes
        MPI_WTIME support: gettimeofday
      Symbol vis. support: yes
           MPI extensions: affinity example
    FT Checkpoint support: no (checkpoint thread: no)
   MPI_MAX_PROCESSOR_NAME: 256
     MPI_MAX_ERROR_STRING: 256
      MPI_MAX_OBJECT_NAME: 64
         MPI_MAX_INFO_KEY: 36
         MPI_MAX_INFO_VAL: 256
        MPI_MAX_PORT_NAME: 1024
   MPI_MAX_DATAREP_STRING: 128
            MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.5.3)
           MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.5.3)
               MCA memory: linux (MCA v2.0, API v2.0, Component v1.5.3)
            MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.5.3)
                MCA carto: auto_detect (MCA v2.0, API v2.0, Component
v1.5.3)
                MCA carto: file (MCA v2.0, API v2.0, Component v1.5.3)
            MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.5.3)
                MCA timer: linux (MCA v2.0, API v2.0, Component v1.5.3)
          MCA installdirs: env (MCA v2.0, API v2.0, Component v1.5.3)
          MCA installdirs: config (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA dpm: orte (MCA v2.0, API v2.0, Component v1.5.3)
               MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.5.3)
            MCA allocator: basic (MCA v2.0, API v2.0, Component v1.5.3)
            MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: basic (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: inter (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: self (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: sm (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: sync (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA coll: tuned (MCA v2.0, API v2.0, Component v1.5.3)
                   MCA io: romio (MCA v2.0, API v2.0, Component v1.5.3)
                MCA mpool: fake (MCA v2.0, API v2.0, Component v1.5.3)
                MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.5.3)
                MCA mpool: sm (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: bfo (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: csum (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA pml: v (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA bml: r2 (MCA v2.0, API v2.0, Component v1.5.3)
               MCA rcache: vma (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA btl: self (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA btl: sm (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA btl: tcp (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA topo: unity (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA osc: rdma (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA iof: hnp (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA iof: orted (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA iof: tool (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA oob: tcp (MCA v2.0, API v2.0, Component v1.5.3)
                 MCA odls: default (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ras: cm (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: load_balance (MCA v2.0, API v2.0, Component
v1.5.3)
                MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: round_robin (MCA v2.0, API v2.0, Component
v1.5.3)
                MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.5.3)
                MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA rml: oob (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: binomial (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: cm (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: direct (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: linear (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: radix (MCA v2.0, API v2.0, Component v1.5.3)
               MCA routed: slave (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA plm: rsh (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA plm: rshd (MCA v2.0, API v2.0, Component v1.5.3)
                MCA filem: rsh (MCA v2.0, API v2.0, Component v1.5.3)
               MCA errmgr: default (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: env (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: hnp (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: singleton (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: slave (MCA v2.0, API v2.0, Component v1.5.3)
                  MCA ess: tool (MCA v2.0, API v2.0, Component v1.5.3)
              MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.5.3)
              MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.5.3)
              MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.5.3)
             MCA notifier: command (MCA v2.0, API v1.0, Component v1.5.3)
             MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.5.3)

>
>>> Adding the debug flags just changes the race condition. Interestingly, those values only impact the behavior of mpirun, so it looks like the race condition is occurring there.
>> The problem is that the error is totally nondeterministic. Sometimes happens, others not but the error message gives me no clue where the error is coming from. Is is a problem of my code or internal MPI?
> Can't tell, but it is likely an impact of threading. Race conditions within threaded environments are common, and OMPI isn't particularly thread safe, especially when it comes to comm_spawn.
>
>> cheers, Simone
>>>
>>> On Sep 6, 2011, at 3:01 AM, Simone Pellegrini wrote:
>>>
>>>> Dear all,
>>>> I am developing an MPI application which uses heavily MPI_Spawn. Usually everything works fine for the first hundred spawn but after a while the application exist with a curious message:
>>>>
>>>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/grpcomm_base_modex.c at line 349
>>>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file grpcomm_bad_module.c at line 518
>>>> --------------------------------------------------------------------------
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or environment
>>>> problems. This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>> ompi_proc_set_arch failed
>>>> --> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
>>>> --------------------------------------------------------------------------
>>>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
>>>> *** This is disallowed by the MPI standard.
>>>> *** Your MPI job will now abort.
>>>> [arch-top:27712] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
>>>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/grpcomm_base_modex.c at line 349
>>>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file grpcomm_bad_module.c at line 518
>>>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
>>>> *** This is disallowed by the MPI standard.
>>>> *** Your MPI job will now abort.
>>>> [arch-top:27714] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
>>>> [arch-top:27226] 1 more process has sent help message help-mpi-runtime / mpi_init:startup:internal-failure
>>>> [arch-top:27226] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>>>>
>>>> Also using MPI_init instead of MPI_Init_thread does not help, the same error occurs.
>>>>
>>>> Strangely the error does not occur if I run the code enabling debug in (-mca plm_base_verbose 5 -mca rmaps_base_verbose 5).
>>>>
>>>> I am using OpenMPI 1.5.3
>>>>
>>>> cheers, Simone
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users