Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-10-13 17:07:04


The problem is that we haven't done a whole lot of testing yet for
multiple thread support. In particular, OMPI was fundamentally
designed for both THREAD_MULTIPLE and progression thread support, and
several groups have done testing to ensure that when compiled with
multiple thread support, OMPI doesn't just hang, but:

- I don't know if we have tested the progress thread support in quite
a long time, and
- Even though OMPI has been tested to ensure that we don't have
boneheaded extra locks/unlocks, not a whole lot of testing has
occurred to ensure that THREAD_MULTIPLE support is completely solid.

As such, I'd be surprised if THREAD_MULTIPLE works for any
applications that do anything reasonably "interesting" with multiple
threads and MPI (it pains me to say this, but I'd rather be honest
than string you along! :-( ).

That being said, THREAD_MULTIPLE support is going to become more
relevant in the next several months (i.e., various organizations have
a vested interest in THREAD_MULTIPLE and the work will resume in this
area).

On Oct 12, 2006, at 7:02 AM, Cupp, Matthew R wrote:

> Hi,
>
> I ran the program with the debug flag (-d) and got the following
> (hopefully it helps)...
>
> Thanks,
> Matt
>
>
> [master:32399] [0,0,0] setting up session dir with
> [master:32399] universe default-universe
> [master:32399] user cuppm
> [master:32399] host master
> [master:32399] jobid 0
> [master:32399] procid 0
> [master:32399] procdir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe/0/0
> [master:32399] jobdir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe/0
> [master:32399] unidir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe
> [master:32399] top: openmpi-sessions-cuppm_at_master_0
> [master:32399] tmp: /tmp
> [master:32399] [0,0,0] contact_file
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe/universe-
> setup.txt
> [master:32399] [0,0,0] wrote setup file
> [master:32399] pls:rsh: local csh: 0, local bash: 1
> [master:32399] pls:rsh: assuming same remote shell as local shell
> [master:32399] pls:rsh: remote csh: 0, remote bash: 1
> [master:32399] pls:rsh: final template argv:
> [master:32399] pls:rsh: /usr/bin/ssh <template> orted --debug
> --bootproxy 1 --name <template> --num_procs 4 --vpid_start 0 --
> nodename
> <template> --universe cuppm_at_master:default-universe --nsreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331" --
> gprreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331"
> --mpi-call-yield 0
> [master:32399] pls:rsh: launching on node node02
> [master:32399] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [master:32399] pls:rsh: node02 is a REMOTE node
> [master:32399] pls:rsh: executing: /usr/bin/ssh node02
> PATH=/opt/openmpi/bin:$PATH ; export PATH ;
> LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH ; export
> LD_LIBRARY_PATH ; /opt/openmpi/bin/orted --debug --bootproxy 1 --name
> 0.0.1 --num_procs 4 --vpid_start 0 --nodename node02 --universe
> cuppm_at_master:default-universe --nsreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331" --
> gprreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331"
> --mpi-call-yield 0
> [node02:05515] [0,0,1] setting up session dir with
> [node02:05515] universe default-universe
> [node02:05515] user cuppm
> [node02:05515] host node02
> [node02:05515] jobid 0
> [node02:05515] procid 1
> [node02:05515] procdir:
> /tmp/openmpi-sessions-cuppm_at_node02_0/default-universe/0/1
> [node02:05515] jobdir:
> /tmp/openmpi-sessions-cuppm_at_node02_0/default-universe/0
> [node02:05515] unidir:
> /tmp/openmpi-sessions-cuppm_at_node02_0/default-universe
> [node02:05515] top: openmpi-sessions-cuppm_at_node02_0
> [node02:05515] tmp: /tmp
> [master:32399] pls:rsh: launching on node node01
> [master:32399] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [master:32399] pls:rsh: node01 is a REMOTE node
> [master:32399] pls:rsh: executing: /usr/bin/ssh node01
> PATH=/opt/openmpi/bin:$PATH ; export PATH ;
> LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH ; export
> LD_LIBRARY_PATH ; /opt/openmpi/bin/orted --debug --bootproxy 1 --name
> 0.0.2 --num_procs 4 --vpid_start 0 --nodename node01 --universe
> cuppm_at_master:default-universe --nsreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331" --
> gprreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331"
> --mpi-call-yield 0
> [node01:15482] [0,0,2] setting up session dir with
> [node01:15482] universe default-universe
> [node01:15482] user cuppm
> [node01:15482] host node01
> [node01:15482] jobid 0
> [node01:15482] procid 2
> [node01:15482] procdir:
> /tmp/openmpi-sessions-cuppm_at_node01_0/default-universe/0/2
> [node01:15482] jobdir:
> /tmp/openmpi-sessions-cuppm_at_node01_0/default-universe/0
> [node01:15482] unidir:
> /tmp/openmpi-sessions-cuppm_at_node01_0/default-universe
> [node01:15482] top: openmpi-sessions-cuppm_at_node01_0
> [node01:15482] tmp: /tmp
> [master:32399] pls:rsh: launching on node master
> [master:32399] pls:rsh: not oversubscribed -- setting
> mpi_yield_when_idle to 0
> [master:32399] pls:rsh: master is a LOCAL node
> [master:32399] pls:rsh: reset PATH:
> /opt/openmpi/bin:/opt/maui/bin:/opt/torque/bin:/opt/bin:/opt/
> hdfview/bin
> :/opt/hdf/bin:/opt/ncarg/bin:/opt/mpich/p4-gnu/bin:/opt/mpiexec//
> bin:/us
> r/kerberos/bin:/opt/java/jdk1.5.0/bin:/usr/lib64/ccache/bin:/usr/
> local/b
> in:/bin:/usr/bin:/opt/java/jdk1.5.0/jre/bin
> [master:32399] pls:rsh: reset LD_LIBRARY_PATH:
> /opt/openmpi/lib:/usr/lib/jvm/jdk1.5.0_08/jre/lib/amd64/server:/usr/
> lib/
> jvm/jdk1.5.0_08/jre/lib/amd64:/usr/lib/jvm/jdk1.5.0_08/jre/../lib/
> amd64:
> /opt/mpich/p4-gnu/lib:/usr/lib64/mozilla-1.7.13
> [master:32399] pls:rsh: changing to directory /home/cuppm
> [master:32399] pls:rsh: executing: orted --debug --bootproxy 1 --name
> 0.0.3 --num_procs 4 --vpid_start 0 --nodename master --universe
> cuppm_at_master:default-universe --nsreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331" --
> gprreplica
> "0.0.0;tcp://192.168.1.254:6331;tcp://131.167.49.200:6331"
> --mpi-call-yield 0
> [master:32408] [0,0,3] setting up session dir with
> [master:32408] universe default-universe
> [master:32408] user cuppm
> [master:32408] host master
> [master:32408] jobid 0
> [master:32408] procid 3
> [master:32408] procdir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe/0/3
> [master:32408] jobdir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe/0
> [master:32408] unidir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe
> [master:32408] top: openmpi-sessions-cuppm_at_master_0
> [master:32408] tmp: /tmp
> Calling Init...
> [master:32410] [0,1,0] setting up session dir with
> [master:32410] universe default-universe
> [master:32410] user cuppm
> [master:32410] host master
> [master:32410] jobid 1
> [master:32410] procid 0
> [master:32410] procdir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe/1/0
> [master:32410] jobdir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe/1
> [master:32410] unidir:
> /tmp/openmpi-sessions-cuppm_at_master_0/default-universe
> [master:32410] top: openmpi-sessions-cuppm_at_master_0
> [master:32410] tmp: /tmp
> Calling Init...
> Calling Init...
> [node02:05517] [0,1,2] setting up session dir with
> [node02:05517] universe default-universe
> [node02:05517] user cuppm
> [node02:05517] host node02
> [node02:05517] jobid 1
> [node02:05517] procid 2
> [node02:05517] procdir:
> /tmp/openmpi-sessions-cuppm_at_node02_0/default-universe/1/2
> [node02:05517] jobdir:
> /tmp/openmpi-sessions-cuppm_at_node02_0/default-universe/1
> [node02:05517] unidir:
> /tmp/openmpi-sessions-cuppm_at_node02_0/default-universe
> [node02:05517] top: openmpi-sessions-cuppm_at_node02_0
> [node02:05517] tmp: /tmp
> [node01:15484] [0,1,1] setting up session dir with
> [node01:15484] universe default-universe
> [node01:15484] user cuppm
> [node01:15484] host node01
> [node01:15484] jobid 1
> [node01:15484] procid 1
> [node01:15484] procdir:
> /tmp/openmpi-sessions-cuppm_at_node01_0/default-universe/1/1
> [node01:15484] jobdir:
> /tmp/openmpi-sessions-cuppm_at_node01_0/default-universe/1
> [node01:15484] unidir:
> /tmp/openmpi-sessions-cuppm_at_node01_0/default-universe
> [node01:15484] top: openmpi-sessions-cuppm_at_node01_0
> [node01:15484] tmp: /tmp
> [node02:05517] mca_btl_sm_component_init: mkfifo failed with errno=17
> [node01:15484] mca_btl_sm_component_init: mkfifo failed with errno=17
> [master:32399] spawn: in job_state_callback(jobid = 1, state = 0x4)
> [master:32399] Info: Setting up debugger process table for
> applications
> MPIR_being_debugged = 0
> MPIR_debug_gate = 0
> MPIR_debug_state = 1
> MPIR_acquired_pre_main = 0
> MPIR_i_am_starter = 0
> MPIR_proctable_size = 3
> MPIR_proctable:
> (i, host, exe, pid) = (0, master,
> /home/cuppm/workspace/MpiTest/Debug/MpiTest, 32410)
> (i, host, exe, pid) = (1, node01,
> /home/cuppm/workspace/MpiTest/Debug/MpiTest, 15484)
> (i, host, exe, pid) = (2, node02,
> /home/cuppm/workspace/MpiTest/Debug/MpiTest, 5517)
> ______________________________
> Matt Cupp
> Battelle Memorial Institute
> Statistics and Information Analysis
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems