Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] IO issue with OpenMPI 1.4.1 and earlier versions
From: Steve Jones (stevejones_at_[hidden])
Date: 2011-09-12 22:44:25


Hi.

We've run into an IO issue with 1.4.1 and earlier versions. We're able to reproduce the issue in around 120 lines of code to help, I'd like to find if there's something we're simply doing incorrectly with the build or if it's in fact a known bug. I've included the following in order:

1. Configure options used on all versions tested
2. Successful run on 1.4.3
3. Failed run on 1.3.1
4. Failed run on 1.4.1
5. Source code of test
6. ompi_info

We're running this on a single node with 2 processes.

An additional thing to note is we can load the 1.4.2 or 1.4.3 environment and successfully run the 1.4.1 or 1.3.1 executable.

Thanks.

Steve

1.
./configure --prefix=/share/apps/openmpi/1.4.1/intel-12 --with-tm=/opt/torque --enable-debug --with-openib --with-wrapper-cflags="-shared-intel" --with-wrapper-cxxflags="-shared-intel" --with-wrapper-fflags="-shared-intel" --with-wrapper-fcflags="-shared-intel"

2.
[smjones_at_compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.4.3 10
iotest running on mpi_size: 2
writing 10 ints to file iotest.dat...
rank 0 writing: 0 to 4
rank 1 writing: 5 to 9
reading 10 ints from file iotest.dat...
just read: 0 0
just read: 1 1
just read: 2 2
just read: 3 3
just read: 4 4
just read: 5 5
just read: 6 6
just read: 7 7
just read: 8 8
just read: 9 9
File looks good.

3.
[smjones_at_compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.3.1 100
iotest running on mpi_size: 2
writing 100 ints to file iotest.dat...
rank 0 writing: 0 to 49
rank 1 writing: 50 to 99
reading 100 ints from file iotest.dat...
just read: 0 50
iotest.openmpi-1.3.1: iotest.cpp:105: int main(int, char**): Assertion `ibuf == i' failed.
[compute-1-1:18731] *** Process received signal ***
[compute-1-1:18731] Signal: Aborted (6)
[compute-1-1:18731] Signal code: (-6)
[compute-1-1:18731] [ 0] /lib64/libpthread.so.0 [0x357800e7c0]
[compute-1-1:18731] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3577830265]
[compute-1-1:18731] [ 2] /lib64/libc.so.6(abort+0x110) [0x3577831d10]
[compute-1-1:18731] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35778296e6]
[compute-1-1:18731] [ 4] codes/cti/tests/iotest/iotest.openmpi-1.3.1(main+0x3db) [0x408e7f]
[compute-1-1:18731] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x357781d994]
[compute-1-1:18731] [ 6] codes/cti/tests/iotest/iotest.openmpi-1.3.1(__gxx_personality_v0+0x139) [0x408989]
[compute-1-1:18731] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 18731 on node compute-1-1.local exited on signal 6 (Aborted).
--------------------------------------------------------------------------

4.
[smjones_at_compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.4.1 100
iotest running on mpi_size: 2
writing 100 ints to file iotest.dat...
rank 1 writing: 50 to 99
rank 0 writing: 0 to 49
reading 100 ints from file iotest.dat...
just read: 0 50
iotest.openmpi-1.4.1: iotest.cpp:105: int __unixcall main(int, char **): Assertion `ibuf == i' failed.
[compute-1-1:19057] *** Process received signal ***
[compute-1-1:19057] Signal: Aborted (6)
[compute-1-1:19057] Signal code: (-6)
[compute-1-1:19057] [ 0] /lib64/libpthread.so.0 [0x357800e7c0]
[compute-1-1:19057] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3577830265]
[compute-1-1:19057] [ 2] /lib64/libc.so.6(abort+0x110) [0x3577831d10]
[compute-1-1:19057] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35778296e6]
[compute-1-1:19057] [ 4] codes/cti/tests/iotest/iotest.openmpi-1.4.1(main+0x472) [0x401ab2]
[compute-1-1:19057] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x357781d994]
[compute-1-1:19057] [ 6] codes/cti/tests/iotest/iotest.openmpi-1.4.1(__gxx_personality_v0+0x41) [0x401589]
[compute-1-1:19057] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 19057 on node compute-1-1.local exited on signal 6 (Aborted).
--------------------------------------------------------------------------

5.
[smjones_at_frontend iotest]$ cat iotest.cpp
#include <iostream>
#include <math.h>
#include <assert.h>
#include <mpi.h>

using std::cout;
using std::cerr;
using std::endl;

// iotest
// This simple test reproduces a problem with writing in MPI_Type_indexed in openmpi.
//

int main(int argc,char * argv[]) {
  
  MPI_Init(&argc,&argv);

  int mpi_size;
  MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
  
  int mpi_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
  
  if (mpi_rank == 0)
    cout << "iotest running on mpi_size: " << mpi_size << endl;
  
  if (argc != 2) {
    if (mpi_rank == 0)
      cout << "\n\nUsage: \n\nmpirun -np X iotest <global_number_of_ints>\n\n" << endl;
    MPI_Finalize();
    return(-1);
  }
  
  // how many ints to write...
  
  int n = atoi(argv[1]);
  if (mpi_rank == 0)
    cout << "writing " << n << " ints to file iotest.dat..." << endl;
  
  // everybody figure out their local offset and size...
  
  int my_disp = mpi_rank*n/mpi_size;
  int my_n = (mpi_rank+1)*n/mpi_size - my_disp;
  
  cout << "rank " << mpi_rank << " writing: " << my_disp << " to " << my_disp+my_n-1 << endl;

  MPI_File fh;
  int ierr = MPI_File_open(MPI_COMM_WORLD,"iotest.dat",
                           MPI_MODE_WRONLY | MPI_MODE_CREATE,
                           MPI_INFO_NULL,&fh);
  assert(ierr == 0);

  // build the type...

  MPI_Datatype int_type;
  MPI_Type_indexed(1,&my_n,&my_disp,MPI_INT,&int_type);
  

  MPI_Type_commit(&int_type);

  // fill a buffer of ints with increasing values, starting with our offset...
  
  int * buf = new int[my_n];
  for (int i = 0; i < my_n; ++i)
    buf[i] = my_disp + i;

  // set our view into the file...
  
  MPI_Offset offset = 0;
  MPI_File_set_view(fh, offset, MPI_INT, int_type, "native", MPI_INFO_NULL);
  
  // and write...

  MPI_Status status;
  MPI_File_write_all(fh, buf, my_n, MPI_INT, &status);
  
  // trim the file to the current size and close...

  offset += n*sizeof(int);
  MPI_File_set_size(fh,offset);
  MPI_File_close(&fh);

  // cleanup...

  delete[] buf;
  MPI_Type_free(&int_type);

  // ---------------------------------------------------

  // now let rank 0 read the file using standard io and check for
  // correctness...

  if (mpi_rank == 0) {

    if (mpi_rank == 0)
      cout << "reading " << n << " ints from file iotest.dat..." << endl;
    
    FILE * fp = fopen("iotest.dat","rb");
    for (int i = 0; i < n; ++i) {
      // just read one at a time - ouch!
      int ibuf;
      fread(&ibuf,sizeof(int),1,fp);
      cout << "just read: " << i << " " << ibuf << endl;
      assert(ibuf == i);
    }
    fclose(fp);
    cout << "File looks good." << endl;
    
  }
  MPI_Barrier(MPI_COMM_WORLD);
  
  // shut down MPI stuff...
  
  MPI_Finalize();
  return(0);
  
}

6.
[smjones_at_frontend iotest]$ ompi_info
                 Package: Open MPI root_at_[hidden] Distribution
                Open MPI: 1.4.3
   Open MPI SVN revision: r23834
   Open MPI release date: Oct 05, 2010
                Open RTE: 1.4.3
   Open RTE SVN revision: r23834
   Open RTE release date: Oct 05, 2010
                    OPAL: 1.4.3
       OPAL SVN revision: r23834
       OPAL release date: Oct 05, 2010
            Ident string: 1.4.3
                  Prefix: /share/apps/openmpi/1.4.3/intel-12
 Configured architecture: x86_64-unknown-linux-gnu
          Configure host: frontend.somewhere.com
           Configured by: root
           Configured on: Mon Sep 12 18:02:17 PDT 2011
          Configure host: frontend.somewhere.com
                Built by: root
                Built on: Mon Sep 12 18:13:08 PDT 2011
              Built host: frontend.somewhere.com
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: icc
     C compiler absolute: /opt/intel/composerxe-2011.2.137/bin/intel64/icc
            C++ compiler: icpc
   C++ compiler absolute: /opt/intel/composerxe-2011.2.137/bin/intel64/icpc
      Fortran77 compiler: ifort
  Fortran77 compiler abs: /opt/intel/composerxe-2011.2.137/bin/intel64/ifort
      Fortran90 compiler: ifort
  Fortran90 compiler abs: /opt/intel/composerxe-2011.2.137/bin/intel64/ifort
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
           Sparse Groups: no
  Internal debug support: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no (checkpoint thread: no)
           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.3)
              MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.3)
           MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.3)
               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.3)
               MCA carto: file (MCA v2.0, API v2.0, Component v1.4.3)
           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.3)
               MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.3)
         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.3)
         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.3)
              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.3)
           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.3)
           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.3)
                MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.3)
                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.3)
                MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.3)
                MCA coll: self (MCA v2.0, API v2.0, Component v1.4.3)
                MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.3)
                MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.3)
                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.3)
                  MCA io: romio (MCA v2.0, API v2.0, Component v1.4.3)
               MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.3)
               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.3)
               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA pml: v (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.3)
              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA btl: ofud (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA btl: self (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.3)
                MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.3)
                MCA odls: default (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA ras: tm (MCA v2.0, API v2.0, Component v1.4.3)
               MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4.3)
               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.3)
               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.3)
               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.3)
              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.3)
              MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.3)
              MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA plm: tm (MCA v2.0, API v2.0, Component v1.4.3)
               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.3)
              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA ess: env (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.3)
             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.3)
             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.3)