Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Marty Humphrey (humphrey_at_[hidden])
Date: 2005-11-01 12:02:24


Hi,

I want to use openmpi across two machines, each machine has more than one
NIC:

wukong: eth0 (152.48.249.102, no MPI traffic), eth1 (128.109.34.20,yes MPI
traffic)

zelda01: eth0 (130.207.252.131, yes MPI traffic), eth2 (10.0.0.12, no MPI
traffic)

on wukong, I have :
[humphrey_at_wukong ~]$ more ~/.openmpi/mca-params.conf
btl_tcp_if_include=eth1

on zelda01, I have :
[humphrey_at_zelda01 humphrey]$ more ~/.openmpi/mca-params.conf
btl_tcp_if_include=eth0

Here's what I get when I attempt to run it from wukong (128.109.34.20). It
just hangs at this point, as I believe the remote machine (Zelda01) is
trying to make contact with wukong on the non-accessible interface
(152.48.249.102). This is based on openmpi-1.0rc5r7944.

What am I doing wrong?

Thanks,
Marty

Marty Humphrey
Assistant Professor
Department of Computer Science
University of Virginia

[humphrey_at_wukong ~]$ mpirun -d --mca btl tcp --host
128.109.34.20,130.207.252.131 -np 2 a.out [wukong.ncren.net:17236] [0,0,0]
setting up session dir with
[wukong.ncren.net:17236] universe default-universe
[wukong.ncren.net:17236] user humphrey
[wukong.ncren.net:17236] host wukong.ncren.net
[wukong.ncren.net:17236] jobid 0
[wukong.ncren.net:17236] procid 0
[wukong.ncren.net:17236] procdir:
/tmp/openmpi-sessions-humphrey_at_[hidden]_0/default-universe/0/0
[wukong.ncren.net:17236] jobdir:
/tmp/openmpi-sessions-humphrey_at_[hidden]_0/default-universe/0
[wukong.ncren.net:17236] unidir:
/tmp/openmpi-sessions-humphrey_at_[hidden]_0/default-universe
[wukong.ncren.net:17236] top: openmpi-sessions-humphrey_at_[hidden]_0
[wukong.ncren.net:17236] tmp: /tmp
[wukong.ncren.net:17236] [0,0,0] contact_file
/tmp/openmpi-sessions-humphrey_at_[hidden]_0/default-universe/universe-
setup.txt
[wukong.ncren.net:17236] [0,0,0] wrote setup file [wukong.ncren.net:17236]
pls:rsh: local csh: 0, local bash: 1 [wukong.ncren.net:17236] pls:rsh:
assuming same remote shell as local shell [wukong.ncren.net:17236] pls:rsh:
remote csh: 0, remote bash: 1 [wukong.ncren.net:17236] pls:rsh: final
template argv:
[wukong.ncren.net:17236] pls:rsh: ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 3 --vpid_start 0 --nodename
<template> --universe humphrey_at_[hidden]:default-universe --nsreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964" --gprreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--mpi-call-yield 0
[wukong.ncren.net:17236] pls:rsh: launching on node 128.109.34.20
[wukong.ncren.net:17236] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0 [wukong.ncren.net:17236] pls:rsh: 128.109.34.20 is
a LOCAL node [wukong.ncren.net:17236] pls:rsh: executing: orted --debug
--bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --nodename
128.109.34.20 --universe humphrey_at_[hidden]:default-universe
--nsreplica "0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--gprreplica "0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--mpi-call-yield 0 [wukong.ncren.net:17237] [0,0,1] setting up session dir
with
[wukong.ncren.net:17237] universe default-universe
[wukong.ncren.net:17237] user humphrey
[wukong.ncren.net:17237] host 128.109.34.20
[wukong.ncren.net:17237] jobid 0
[wukong.ncren.net:17237] procid 1
[wukong.ncren.net:17237] procdir:
/tmp/openmpi-sessions-humphrey_at_128.109.34.20_0/default-universe/0/1
[wukong.ncren.net:17237] jobdir:
/tmp/openmpi-sessions-humphrey_at_128.109.34.20_0/default-universe/0
[wukong.ncren.net:17237] unidir:
/tmp/openmpi-sessions-humphrey_at_128.109.34.20_0/default-universe
[wukong.ncren.net:17237] top: openmpi-sessions-humphrey_at_128.109.34.20_0
[wukong.ncren.net:17237] tmp: /tmp
[wukong.ncren.net:17236] pls:rsh: launching on node 130.207.252.131
[wukong.ncren.net:17236] pls:rsh: not oversubscribed -- setting
mpi_yield_when_idle to 0 [wukong.ncren.net:17236] pls:rsh: 130.207.252.131
is a REMOTE node [wukong.ncren.net:17236] pls:rsh: executing: ssh
130.207.252.131 orted --debug --bootproxy 1 --name 0.0.2 --num_procs 3
--vpid_start 0 --nodename 130.207.252.131 --universe
humphrey_at_[hidden]:default-universe --nsreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964" --gprreplica
"0.0.0;tcp://152.48.249.102:33964;tcp://128.109.34.20:33964"
--mpi-call-yield 0 [zelda01.localdomain:08631] [0,0,2] setting up session
dir with
[zelda01.localdomain:08631] universe default-universe
[zelda01.localdomain:08631] user humphrey
[zelda01.localdomain:08631] host 130.207.252.131
[zelda01.localdomain:08631] jobid 0
[zelda01.localdomain:08631] procid 2
[zelda01.localdomain:08631] procdir:
/tmp/openmpi-sessions-humphrey_at_130.207.252.131_0/default-universe/0/2
[zelda01.localdomain:08631] jobdir:
/tmp/openmpi-sessions-humphrey_at_130.207.252.131_0/default-universe/0
[zelda01.localdomain:08631] unidir:
/tmp/openmpi-sessions-humphrey_at_130.207.252.131_0/default-universe
[zelda01.localdomain:08631] top: openmpi-sessions-humphrey_at_130.207.252.131_0
[zelda01.localdomain:08631] tmp: /tmp
[wukong.ncren.net:17239] [0,1,0] setting up session dir with
[wukong.ncren.net:17239] universe default-universe
[wukong.ncren.net:17239] user humphrey
[wukong.ncren.net:17239] host 128.109.34.20
[wukong.ncren.net:17239] jobid 1
[wukong.ncren.net:17239] procid 0
[wukong.ncren.net:17239] procdir:
/tmp/openmpi-sessions-humphrey_at_128.109.34.20_0/default-universe/1/0
[wukong.ncren.net:17239] jobdir:
/tmp/openmpi-sessions-humphrey_at_128.109.34.20_0/default-universe/1
[wukong.ncren.net:17239] unidir:
/tmp/openmpi-sessions-humphrey_at_128.109.34.20_0/default-universe
[wukong.ncren.net:17239] top: openmpi-sessions-humphrey_at_128.109.34.20_0
[wukong.ncren.net:17239] tmp: /tmp