Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Frank Kahle (openmpi-user_at_[hidden])
Date: 2006-07-05 13:15:37


users-request_at_[hidden] wrote:
> A few clarifying questions:
>
> What is your netmask on these hosts?
>
> Where is the MPI_ALLREDUCE in your app -- right away, or somewhere deep
> within the application? Can you replicate this with a simple MPI
> application that essentially calls MPI_INIT, MPI_ALLREDUCE, and
> MPI_FINALIZE?
>
> Can you replicate this with a simple MPI app that does an MPI_SEND /
> MPI_RECV between two processes on the different subnets?
>
> Thanks.
>
>

@ Jeff,

netmask 255.255.255.0

Running a simple "hello world" yields no error on each subnet, but
running "hello world" on both subnets yields the error

[g5dual.3-net:00436] *** An error occurred in MPI_Send
[g5dual.3-net:00436] *** on communicator MPI_COMM_WORLD
[g5dual.3-net:00436] *** MPI_ERR_INTERN: internal error
[g5dual.3-net:00436] *** MPI_ERRORS_ARE_FATAL (goodbye)

Hope this helps!

Frank

Just in case you wanna check the source:
c Fortran example hello_world
      program hello
      include 'mpif.h'
      integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
      character*12 message

      call MPI_INIT(ierror)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
      call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
      tag = 100

      if (rank .eq. 0) then
        message = 'Hello, world'
        do i=1, size-1
          call MPI_SEND(message, 12, MPI_CHARACTER, i, tag,
     & MPI_COMM_WORLD, ierror)
        enddo

      else
        call MPI_RECV(message, 12, MPI_CHARACTER, 0, tag,
     & MPI_COMM_WORLD, status, ierror)
      endif

      print*, 'node', rank, ':', message
      call MPI_FINALIZE(ierror)
      end

or the full output:

[powerbook:/Network/CFD/hello] motte% mpirun -d -np 5 --hostfile
./hostfile /Network/CFD/hello/hello_world
[powerbook.2-net:00606] [0,0,0] setting up session dir with
[powerbook.2-net:00606] universe default-universe
[powerbook.2-net:00606] user motte
[powerbook.2-net:00606] host powerbook.2-net
[powerbook.2-net:00606] jobid 0
[powerbook.2-net:00606] procid 0
[powerbook.2-net:00606] procdir:
/tmp/openmpi-sessions-motte_at_powerbook.2-net_0/default-universe/0/0
[powerbook.2-net:00606] jobdir:
/tmp/openmpi-sessions-motte_at_powerbook.2-net_0/default-universe/0
[powerbook.2-net:00606] unidir:
/tmp/openmpi-sessions-motte_at_powerbook.2-net_0/default-universe
[powerbook.2-net:00606] top: openmpi-sessions-motte_at_powerbook.2-net_0
[powerbook.2-net:00606] tmp: /tmp
[powerbook.2-net:00606] [0,0,0] contact_file
/tmp/openmpi-sessions-motte_at_powerbook.2-net_0/default-universe/universe-setup.txt
[powerbook.2-net:00606] [0,0,0] wrote setup file
[powerbook.2-net:00606] pls:rsh: local csh: 1, local bash: 0
[powerbook.2-net:00606] pls:rsh: assuming same remote shell as local shell
[powerbook.2-net:00606] pls:rsh: remote csh: 1, remote bash: 0
[powerbook.2-net:00606] pls:rsh: final template argv:
[powerbook.2-net:00606] pls:rsh: /usr/bin/ssh <template> orted
--debug --bootproxy 1 --name <template> --num_procs 6 --vpid_start 0
--nodename <template> --universe motte_at_powerbook.2-net:default-universe
--nsreplica "0.0.0;tcp://192.168.2.3:49443" --gprreplica
"0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
[powerbook.2-net:00606] pls:rsh: launching on node Powerbook.2-net
[powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0
[powerbook.2-net:00606] pls:rsh: Powerbook.2-net is a LOCAL node
[powerbook.2-net:00606] pls:rsh: changing to directory /Users/motte
[powerbook.2-net:00606] pls:rsh: executing: orted --debug --bootproxy 1
--name 0.0.1 --num_procs 6 --vpid_start 0 --nodename Powerbook.2-net
--universe motte_at_powerbook.2-net:default-universe --nsreplica
"0.0.0;tcp://192.168.2.3:49443" --gprreplica
"0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
[powerbook.2-net:00607] [0,0,1] setting up session dir with
[powerbook.2-net:00607] universe default-universe
[powerbook.2-net:00607] user motte
[powerbook.2-net:00607] host Powerbook.2-net
[powerbook.2-net:00607] jobid 0
[powerbook.2-net:00607] procid 1
[powerbook.2-net:00607] procdir:
/tmp/openmpi-sessions-motte_at_Powerbook.2-net_0/default-universe/0/1
[powerbook.2-net:00607] jobdir:
/tmp/openmpi-sessions-motte_at_Powerbook.2-net_0/default-universe/0
[powerbook.2-net:00607] unidir:
/tmp/openmpi-sessions-motte_at_Powerbook.2-net_0/default-universe
[powerbook.2-net:00607] top: openmpi-sessions-motte_at_Powerbook.2-net_0
[powerbook.2-net:00607] tmp: /tmp
[powerbook.2-net:00606] pls:rsh: launching on node g4d003.3-net
[powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0
[powerbook.2-net:00606] pls:rsh: g4d003.3-net is a REMOTE node
[powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh g4d003.3-net
orted --debug --bootproxy 1 --name 0.0.2 --num_procs 6 --vpid_start 0
--nodename g4d003.3-net --universe
motte_at_powerbook.2-net:default-universe --nsreplica
"0.0.0;tcp://192.168.2.3:49443" --gprreplica
"0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
[g4d003.3-net:00411] [0,0,2] setting up session dir with
[g4d003.3-net:00411] universe default-universe
[g4d003.3-net:00411] user motte
[g4d003.3-net:00411] host g4d003.3-net
[g4d003.3-net:00411] jobid 0
[g4d003.3-net:00411] procid 2
[g4d003.3-net:00411] procdir:
/tmp/openmpi-sessions-motte_at_g4d003.3-net_0/default-universe/0/2
[g4d003.3-net:00411] jobdir:
/tmp/openmpi-sessions-motte_at_g4d003.3-net_0/default-universe/0
[g4d003.3-net:00411] unidir:
/tmp/openmpi-sessions-motte_at_g4d003.3-net_0/default-universe
[g4d003.3-net:00411] top: openmpi-sessions-motte_at_g4d003.3-net_0
[g4d003.3-net:00411] tmp: /tmp
[powerbook.2-net:00606] pls:rsh: launching on node g4d002.3-net
[powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0
[powerbook.2-net:00606] pls:rsh: g4d002.3-net is a REMOTE node
[powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh g4d002.3-net
orted --debug --bootproxy 1 --name 0.0.3 --num_procs 6 --vpid_start 0
--nodename g4d002.3-net --universe
motte_at_powerbook.2-net:default-universe --nsreplica
"0.0.0;tcp://192.168.2.3:49443" --gprreplica
"0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
[powerbook.2-net:00606] pls:rsh: launching on node g4d001.3-net
[powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0
[powerbook.2-net:00606] pls:rsh: g4d001.3-net is a REMOTE node
[powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh g4d001.3-net
orted --debug --bootproxy 1 --name 0.0.4 --num_procs 6 --vpid_start 0
--nodename g4d001.3-net --universe
motte_at_powerbook.2-net:default-universe --nsreplica
"0.0.0;tcp://192.168.2.3:49443" --gprreplica
"0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
[powerbook.2-net:00606] pls:rsh: launching on node G5Dual.3-net
[powerbook.2-net:00606] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0
[powerbook.2-net:00606] pls:rsh: G5Dual.3-net is a REMOTE node
[powerbook.2-net:00606] pls:rsh: executing: /usr/bin/ssh G5Dual.3-net
orted --debug --bootproxy 1 --name 0.0.5 --num_procs 6 --vpid_start 0
--nodename G5Dual.3-net --universe
motte_at_powerbook.2-net:default-universe --nsreplica
"0.0.0;tcp://192.168.2.3:49443" --gprreplica
"0.0.0;tcp://192.168.2.3:49443" --mpi-call-yield 0
[g4d001.3-net:00336] [0,0,4] setting up session dir with
[g4d001.3-net:00336] universe default-universe
[g4d001.3-net:00336] user motte
[g4d001.3-net:00336] host g4d001.3-net
[g4d001.3-net:00336] jobid 0
[g4d001.3-net:00336] procid 4
[g4d001.3-net:00336] procdir:
/tmp/openmpi-sessions-motte_at_g4d001.3-net_0/default-universe/0/4
[g4d001.3-net:00336] jobdir:
/tmp/openmpi-sessions-motte_at_g4d001.3-net_0/default-universe/0
[g4d001.3-net:00336] unidir:
/tmp/openmpi-sessions-motte_at_g4d001.3-net_0/default-universe
[g4d001.3-net:00336] top: openmpi-sessions-motte_at_g4d001.3-net_0
[g4d001.3-net:00336] tmp: /tmp
[g4d002.3-net:00279] [0,0,3] setting up session dir with
[g4d002.3-net:00279] universe default-universe
[g4d002.3-net:00279] user motte
[g4d002.3-net:00279] host g4d002.3-net
[g4d002.3-net:00279] jobid 0
[g4d002.3-net:00279] procid 3
[g4d002.3-net:00279] procdir:
/tmp/openmpi-sessions-motte_at_g4d002.3-net_0/default-universe/0/3
[g4d002.3-net:00279] jobdir:
/tmp/openmpi-sessions-motte_at_g4d002.3-net_0/default-universe/0
[g4d002.3-net:00279] unidir:
/tmp/openmpi-sessions-motte_at_g4d002.3-net_0/default-universe
[g4d002.3-net:00279] top: openmpi-sessions-motte_at_g4d002.3-net_0
[g4d002.3-net:00279] tmp: /tmp
[g5dual.3-net:00434] [0,0,5] setting up session dir with
[g5dual.3-net:00434] universe default-universe
[g5dual.3-net:00434] user motte
[g5dual.3-net:00434] host G5Dual.3-net
[g5dual.3-net:00434] jobid 0
[g5dual.3-net:00434] procid 5
[g5dual.3-net:00434] procdir:
/tmp/openmpi-sessions-motte_at_G5Dual.3-net_0/default-universe/0/5
[g5dual.3-net:00434] jobdir:
/tmp/openmpi-sessions-motte_at_G5Dual.3-net_0/default-universe/0
[g5dual.3-net:00434] unidir:
/tmp/openmpi-sessions-motte_at_G5Dual.3-net_0/default-universe
[g5dual.3-net:00434] top: openmpi-sessions-motte_at_G5Dual.3-net_0
[g5dual.3-net:00434] tmp: /tmp
[powerbook.2-net:00613] [0,1,4] setting up session dir with
[powerbook.2-net:00613] universe default-universe
[powerbook.2-net:00613] user motte
[powerbook.2-net:00613] host Powerbook.2-net
[powerbook.2-net:00613] jobid 1
[powerbook.2-net:00613] procid 4
[powerbook.2-net:00613] procdir:
/tmp/openmpi-sessions-motte_at_Powerbook.2-net_0/default-universe/1/4
[powerbook.2-net:00613] jobdir:
/tmp/openmpi-sessions-motte_at_Powerbook.2-net_0/default-universe/1
[powerbook.2-net:00613] unidir:
/tmp/openmpi-sessions-motte_at_Powerbook.2-net_0/default-universe
[powerbook.2-net:00613] top: openmpi-sessions-motte_at_Powerbook.2-net_0
[powerbook.2-net:00613] tmp: /tmp
[g5dual.3-net:00436] [0,1,0] setting up session dir with
[g5dual.3-net:00436] universe default-universe
[g5dual.3-net:00436] user motte
[g5dual.3-net:00436] host G5Dual.3-net
[g5dual.3-net:00436] jobid 1
[g5dual.3-net:00436] procid 0
[g5dual.3-net:00436] procdir:
/tmp/openmpi-sessions-motte_at_G5Dual.3-net_0/default-universe/1/0
[g5dual.3-net:00436] jobdir:
/tmp/openmpi-sessions-motte_at_G5Dual.3-net_0/default-universe/1
[g5dual.3-net:00436] unidir:
/tmp/openmpi-sessions-motte_at_G5Dual.3-net_0/default-universe
[g5dual.3-net:00436] top: openmpi-sessions-motte_at_G5Dual.3-net_0
[g5dual.3-net:00436] tmp: /tmp
[g4d001.3-net:00338] [0,1,1] setting up session dir with
[g4d001.3-net:00338] universe default-universe
[g4d001.3-net:00338] user motte
[g4d001.3-net:00338] host g4d001.3-net
[g4d001.3-net:00338] jobid 1
[g4d001.3-net:00338] procid 1
[g4d001.3-net:00338] procdir:
/tmp/openmpi-sessions-motte_at_g4d001.3-net_0/default-universe/1/1
[g4d001.3-net:00338] jobdir:
/tmp/openmpi-sessions-motte_at_g4d001.3-net_0/default-universe/1
[g4d001.3-net:00338] unidir:
/tmp/openmpi-sessions-motte_at_g4d001.3-net_0/default-universe
[g4d001.3-net:00338] top: openmpi-sessions-motte_at_g4d001.3-net_0
[g4d001.3-net:00338] tmp: /tmp
[g4d003.3-net:00413] [0,1,3] setting up session dir with
[g4d003.3-net:00413] universe default-universe
[g4d003.3-net:00413] user motte
[g4d003.3-net:00413] host g4d003.3-net
[g4d003.3-net:00413] jobid 1
[g4d003.3-net:00413] procid 3
[g4d003.3-net:00413] procdir:
/tmp/openmpi-sessions-motte_at_g4d003.3-net_0/default-universe/1/3
[g4d003.3-net:00413] jobdir:
/tmp/openmpi-sessions-motte_at_g4d003.3-net_0/default-universe/1
[g4d003.3-net:00413] unidir:
/tmp/openmpi-sessions-motte_at_g4d003.3-net_0/default-universe
[g4d003.3-net:00413] top: openmpi-sessions-motte_at_g4d003.3-net_0
[g4d003.3-net:00413] tmp: /tmp
[g4d002.3-net:00281] [0,1,2] setting up session dir with
[g4d002.3-net:00281] universe default-universe
[g4d002.3-net:00281] user motte
[g4d002.3-net:00281] host g4d002.3-net
[g4d002.3-net:00281] jobid 1
[g4d002.3-net:00281] procid 2
[g4d002.3-net:00281] procdir:
/tmp/openmpi-sessions-motte_at_g4d002.3-net_0/default-universe/1/2
[g4d002.3-net:00281] jobdir:
/tmp/openmpi-sessions-motte_at_g4d002.3-net_0/default-universe/1
[g4d002.3-net:00281] unidir:
/tmp/openmpi-sessions-motte_at_g4d002.3-net_0/default-universe
[g4d002.3-net:00281] top: openmpi-sessions-motte_at_g4d002.3-net_0
[g4d002.3-net:00281] tmp: /tmp
[powerbook.2-net:00606] spawn: in job_state_callback(jobid = 1, state = 0x4)
[powerbook.2-net:00606] Info: Setting up debugger process table for
applications
  MPIR_being_debugged = 0
  MPIR_debug_gate = 0
  MPIR_debug_state = 1
  MPIR_acquired_pre_main = 0
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 5
  MPIR_proctable:
    (i, host, exe, pid) = (0, G5Dual.3-net,
/Network/CFD/hello/hello_world, 436)
    (i, host, exe, pid) = (1, g4d001.3-net,
/Network/CFD/hello/hello_world, 338)
    (i, host, exe, pid) = (2, g4d002.3-net,
/Network/CFD/hello/hello_world, 281)
    (i, host, exe, pid) = (3, g4d003.3-net,
/Network/CFD/hello/hello_world, 413)
    (i, host, exe, pid) = (4, Powerbook.2-net,
/Network/CFD/hello/hello_world, 613)
[powerbook.2-net:00613] [0,1,4] ompi_mpi_init completed
[g4d001.3-net:00338] [0,1,1] ompi_mpi_init completed
[g5dual.3-net:00436] [0,1,0] ompi_mpi_init completed
[g4d003.3-net:00413] [0,1,3] ompi_mpi_init completed
[g4d002.3-net:00281] [0,1,2] ompi_mpi_init completed
 node 1 :Hello, world
 node 2 :Hello, world node 3 :Hello, world
[g5dual.3-net:00436] *** An error occurred in MPI_Send

[g5dual.3-net:00436] *** on communicator MPI_COMM_WORLD
[g5dual.3-net:00436] *** MPI_ERR_INTERN: internal error
[g5dual.3-net:00436] *** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: powerbook.2-net
PID: 613

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d003.3-net
PID: 413

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g5dual.3-net
PID: 436

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d002.3-net
PID: 281

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d001.3-net
PID: 338

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
[g5dual.3-net:00434] sess_dir_finalize: found proc session dir empty -
deleting
[g5dual.3-net:00434] sess_dir_finalize: found job session dir empty -
deleting
[g5dual.3-net:00434] sess_dir_finalize: univ session dir not empty - leaving
[powerbook.2-net:00607] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_ABORTED)
[g5dual.3-net:00434] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_ABORTED)
[g4d003.3-net:00411] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_ABORTED)
[g4d001.3-net:00336] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_ABORTED)
[g5dual.3-net:00434] sess_dir_finalize: job session dir not empty - leaving
[g5dual.3-net:00434] sess_dir_finalize: found proc session dir empty -
deleting
[g5dual.3-net:00434] sess_dir_finalize: found job session dir empty -
deleting
[g5dual.3-net:00434] sess_dir_finalize: found univ session dir empty -
deleting
[g5dual.3-net:00434] sess_dir_finalize: found top session dir empty -
deleting
[g4d002.3-net:00279] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_ABORTED)
[g4d002.3-net:00279] sess_dir_finalize: found job session dir empty -
deleting
[g4d002.3-net:00279] sess_dir_finalize: univ session dir not empty - leaving
[g4d002.3-net:00279] sess_dir_finalize: proc session dir not empty - leaving
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d002.3-net
PID: 281

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d002.3-net
PID: 281

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
[g4d002.3-net:00279] sess_dir_finalize: found proc session dir empty -
deleting
[g4d002.3-net:00279] sess_dir_finalize: found job session dir empty -
deleting
[g4d002.3-net:00279] sess_dir_finalize: found univ session dir empty -
deleting
[g4d002.3-net:00279] sess_dir_finalize: found top session dir empty -
deleting
[powerbook.2-net:00607] sess_dir_finalize: found job session dir empty -
deleting
[powerbook.2-net:00607] sess_dir_finalize: univ session dir not empty -
leaving
[powerbook.2-net:00607] sess_dir_finalize: proc session dir not empty -
leaving
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: powerbook.2-net
PID: 613

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: powerbook.2-net
PID: 613

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
[powerbook.2-net:00607] sess_dir_finalize: found proc session dir empty
- deleting
[powerbook.2-net:00607] sess_dir_finalize: job session dir not empty -
leaving
[g4d001.3-net:00336] sess_dir_finalize: found job session dir empty -
deleting
[g4d001.3-net:00336] sess_dir_finalize: univ session dir not empty - leaving
[g4d001.3-net:00336] sess_dir_finalize: proc session dir not empty - leaving
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d001.3-net
PID: 338

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d001.3-net
PID: 338

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
[g4d001.3-net:00336] sess_dir_finalize: found proc session dir empty -
deleting
[g4d001.3-net:00336] sess_dir_finalize: found job session dir empty -
deleting
[g4d001.3-net:00336] sess_dir_finalize: found univ session dir empty -
deleting
[g4d001.3-net:00336] sess_dir_finalize: found top session dir empty -
deleting
[g4d003.3-net:00411] sess_dir_finalize: found job session dir empty -
deleting
[g4d003.3-net:00411] sess_dir_finalize: univ session dir not empty - leaving
[g4d003.3-net:00411] sess_dir_finalize: proc session dir not empty - leaving
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d003.3-net
PID: 413

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: g4d003.3-net
PID: 413

This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
1 process killed (possibly by Open MPI)
[g4d003.3-net:00411] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_TERMINATED)
[g4d003.3-net:00411] sess_dir_finalize: found proc session dir empty -
deleting
[g4d003.3-net:00411] sess_dir_finalize: found job session dir empty -
deleting
[g4d003.3-net:00411] sess_dir_finalize: found univ session dir empty -
deleting
[g4d003.3-net:00411] sess_dir_finalize: found top session dir empty -
deleting
[powerbook:/Network/CFD/hello] motte%