Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: hpetit_at_[hidden]
Date: 2006-10-30 10:47:48


Hi,
I have a problem using the MPI_Comm_spawn multiple together with bproc.

I want to use the MPI_Comm_spawn multiple call to spawn a set of exe, but in a bproc environment, the program crashes or is stuck on this call (depending of the used open mpi release).

I have created one test program that spawns one other program on the same host (cf. code listing at the end of the mail).

* With open mpi 1.1.2, the program crashs on the MPI_Comm_spawn multiple call:
<--------------------------------->
[myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253
main_exe: Begining of main_exe
main_exe: Call MPI_Init
main_exe: Call MPI_Comm_spawn_multiple()
[myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f70ccf]
[1] func:[0xffffe440]
[2] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_schema_base_get_node_tokens+0x7f) [0xb7fdc41f]
[3] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_node_assign+0x20b) [0xb7fd230b]
[4] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate_nodes+0x41) [0xb7fd0371]
[5] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_ras_hostfile.so [0xb7538ba8]
[6] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate+0xd0) [0xb7fd0470]
[7] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754d62f]
[8] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_rmgr_base_cmd_dispatch+0x137) [0xb7fd9187]
[9] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754e09e]
[10] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0 [0xb7fcd00e]
[11] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7585084]
[12] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7586763]
[13] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0(opal_event_loop+0x199) [0xb7f5f7a9]
[14] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f60353]
[15] func:/lib/tls/libpthread.so.0 [0xb7ef7b63]
[16] func:/lib/tls/libc.so.6(__clone+0x5a) [0xb7e9518a]
*** End of error message ***
<----------------------------------------------->

* With open mpi 1.1.1, the program is simply stuck on the MPI_Comm_spawn multiple call:
<--------------------------------->
[myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253
main_exe: Begining of main_exe
main_exe: Call MPI_Init
main_exe: Call MPI_Comm_spawn_multiple()
[myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line 253
<--------------------------------->

* With open mpi 1.0.2, the program is also stuck on the MPI_Comm_spawn multiple call but there is no ORTE_ERROR_LOG:
<--------------------------------->
main_exe: Begining of main_exe
main_exe: Call MPI_Init
main_exe: Call MPI_Comm_spawn_multiple()
<--------------------------------->


* With open mpi 1.1.2 in a non bproc environment, the program works just fine :
<--------------------------------->
main_exe: Begining of main_exe
main_exe: Call MPI_Init
main_exe: Call MPI_Comm_spawn_multiple()
spawned_exe: Begining of spawned_exe
spawned_exe: Call MPI_Init
main_exe: Back from MPI_Comm_spawn_multiple() result = 0
main_exe: Spawned exe returned errcode = 0
spawned_exe: This exe does not do really much thing actually
main_exe: Call MPI_finalize
main_exe: End of main_exe
<--------------------------------->

Can you help me to solve this problem ?

Regards.

Herve


The bproc release is:
bproc: Beowulf Distributed Process Space Version 4.0.0pre8
bproc: (C) 1999-2003 Erik Hendriks <erik_at_[hidden]>
bproc: Initializing node set. node_ct=1 id_ct=1

the system is a debian sarge with a 2.6.9 kernel installed and patched with bproc.

Eventually, I provide to you the ompi_info log fot he open mpi 1.1.2 release:
                Open MPI: 1.1.2
   Open MPI SVN revision: r12073
                Open RTE: 1.1.2
   Open RTE SVN revision: r12073
                    OPAL: 1.1.2
       OPAL SVN revision: r12073
                  Prefix: /usr/local/Mpi/openmpi-1.1.2
 Configured architecture: i686-pc-linux-gnu
           Configured by: itrsat
           Configured on: Mon Oct 23 12:55:17 CEST 2006
          Configure host: myhost
                Built by: setics
                Built on: lun oct 23 13:09:47 CEST 2006
              Built host: myhost
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: no
      Fortran90 bindings: no
 Fortran90 bindings size: na
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: none
  Fortran77 compiler abs: none
      Fortran90 compiler: none
  Fortran90 compiler abs: none
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: no
     Fortran90 profiling: no
          C++ exceptions: no
          Thread support: posix (mpi: yes, progress: yes)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.2)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.2)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.2)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.2)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.2)
                MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.2)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.1.2)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.2)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.2)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.2)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.2)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.2)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.2)
                  MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: bjs (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA ras: lsf_bproc (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA ras: poe (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.2)
               MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.2)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.2)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA pls: bproc (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA pls: bproc_orted (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA sds: bproc (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1.2)
                 MCA soh: bproc (MCA v1.0, API v1.0, Component v1.1.2)

Here below, the code listings:
* main_exe.c
<------------------------------------------------------------------->
#include "mpi.h"
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int gethostname(char *nom, size_t lg);

int main( int argc, char **argv ) {

    /*
     * MPI_Comm_spawn_multiple parameters
     */
    int result, count, root;
    int maxprocs;
    char **commands;
    MPI_Info infos;
    int errcodes;

    MPI_Comm intercomm, newintracomm;
    int rank;
    char hostname[80];
    int len;
    
    printf( "main_exe: Begining of main_exe\n");
    printf( "main_exe: Call MPI_Init\n");
    MPI_Init( &argc, &argv );
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );

    /*
     * MPI_Comm_spawn_multiple parameters
     */
    count = 1;
    maxprocs = 1;
    root = rank;
    
    commands = malloc (sizeof (char *));
    commands[0] = calloc (80, sizeof (char ));
    sprintf (commands[0], "./spawned_exe");
        
    MPI_Info_create( &infos );
    
    /* set proc/cpu info */
    result = MPI_Info_set( infos, "soft", "0:1" );
    
    /* set host info */
    result = gethostname ( hostname, len);
    if ( -1 == result ) {
        printf ("main_exe: Problem in gethostname\n");
    }
    result = MPI_Info_set( infos, "host", hostname );

    printf( "main_exe: Call MPI_Comm_spawn_multiple()\n");
    result = MPI_Comm_spawn_multiple( count,
                                      commands,
                                      MPI_ARGVS_NULL,
                                      &maxprocs,
                                      &infos,
                                      root,
                                      MPI_COMM_WORLD,
                                      &intercomm,
                                      &errcodes );
    printf( "main_exe: Back from MPI_Comm_spawn_multiple() result = %d\n", result);
    printf( "main_exe: Spawned exe returned errcode = %d\n", errcodes );

    MPI_Intercomm_merge( intercomm, 0, &newintracomm );

    /* Synchronisation with spawned exe */
    MPI_Barrier( newintracomm );

    free( commands[0] );
    free( commands );
    MPI_Comm_free( &newintracomm );

    printf( "main_exe: Call MPI_finalize\n");
    MPI_Finalize( );

    printf( "main_exe: End of main_exe\n");
    return 0;
}

<------------------------------------------------------------------->

* spawned_exe.c
<------------------------------------------------------------------->

#include "mpi.h"
#include <stdio.h>

int main( int argc, char **argv ) {
    MPI_Comm parent, newintracomm;

    printf ("spawned_exe: Begining of spawned_exe\n");
    printf( "spawned_exe: Call MPI_Init\n");
    MPI_Init( &argc, &argv );

    MPI_Comm_get_parent ( &parent );
    MPI_Intercomm_merge ( parent, 1, &newintracomm );

    printf( "spawned_exe: This exe does not do really much thing actually\n" );

    /* Synchronisation with main exe */
    MPI_Barrier( newintracomm );

    MPI_Comm_free( &newintracomm );

    printf( "spawned_exe: Call MPI_finalize\n");
    MPI_Finalize( );

    printf( "spawned_exe: End of spawned_exe\n");
    return 0;
}

<------------------------------------------------------------------->

--------------------- ALICE SECURITE ENFANTS ---------------------
Protégez vos enfants des dangers d'Internet en installant Sécurité Enfants, le contrôle parental d'Alice.
http://www.aliceadsl.fr/securitepc/default_copa.asp