Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Working directory isn't set properly on Linux cluster
From: Todd Gamblin (tgamblin_at_[hidden])
Date: 2008-06-22 00:14:25


I'm having trouble getting OpenMPI to set the working directory
properly when running jobs on a Linux cluster. I made a test program
(at end of post) that recreates the problem pretty well by just
printing out the results of getcwd(). Here's output both with and
without using -wdir:

> (merle):~$ cd test
> (merle):test$ mpirun -np 2 test
> before MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin
> before MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin
> after MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin
> after MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin
> (merle):test$ mpirun -np 2 -wdir /home/tgamblin/test test
> before MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin
> before MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin
> after MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin
> after MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin

Shouldn't these print out /home/tgamblin/test? Also, this is even
stranger:

> (merle):test$ mpirun -np 2 pwd
> /home/tgamblin/test
> /home/tgamblin/test

I feel like my program should output the same thing as pwd.

I'm using OpenMPI 1.2.6, and the cluster has 8 nodes, with 2-by dual-
core woodcrests each (total 32 cores). There are 2 tcp networks on
this cluster, one that the head node uses to talk to the compute nodes
and one (Gigabit) network that the compute nodes can reach each other
(but not the head node) on. I have "btl_tcp_if_include = eth2" in my
mca params file to keep the compute nodes using the fast interconnect
to talk to each other, and I've pasted ifconfig output for the head
node and for one compute node below. Also, if it helps, the home
directories on this machine are mounted via autofs.

This is causing problems b/c I'm using apps that look for the config
file in the working directory. Please let me know if you guys have any
idea what's going on.

Thanks!
-Todd

TEST PROGRAM:
> #include "mpi.h"
> #include <cstdlib>
> #include <iostream>
> #include <sstream>
> using namespace std;
>
> void testdir(const char*where) {
> char buf[1024];
> getcwd(buf, 1024);
>
> ostringstream tmp;
> tmp << where << ":" << endl
> << "\tPWD:\t"<< getenv("PWD") << endl
> << "\tgetcwd:\t"<< getenv("PWD") << endl;
> cout << tmp.str();
> }
>
> int main(int argc, char **argv) {
> testdir("before MPI_Init");
> MPI_Init(&argc, &argv);
> testdir("after MPI_Init");
> MPI_Finalize();
> }

HEAD NODE IFCONFIG:
> eth0 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:90
> inet addr:10.6.1.1 Bcast:10.6.1.255 Mask:255.255.255.0
> inet6 addr: fe80::218:8bff:fe2f:3d90/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:1579250319 errors:0 dropped:0 overruns:0 frame:0
> TX packets:874273636 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:2361367146846 (2.1 TiB) TX bytes:85373933521
> (79.5 GiB)
> Interrupt:169 Memory:f4000000-f4011100
>
> eth0:1 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:90
> inet addr:10.6.2.1 Bcast:10.6.2.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> Interrupt:169 Memory:f4000000-f4011100
>
> eth1 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:8E
> inet addr:152.54.1.21 Bcast:152.54.3.255 Mask:
> 255.255.252.0
> inet6 addr: fe80::218:8bff:fe2f:3d8e/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:14436523 errors:0 dropped:0 overruns:0 frame:0
> TX packets:7357596 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:2354451258 (2.1 GiB) TX bytes:2218390772 (2.0 GiB)
> Interrupt:169 Memory:f8000000-f8011100
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:540889623 errors:0 dropped:0 overruns:0 frame:0
> TX packets:540889623 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:63787539844 (59.4 GiB) TX bytes:63787539844
> (59.4 GiB)
>

COMPUTE NODE IFCONFIG:
> (compute-0-0):~$ ifconfig
> eth0 Link encap:Ethernet HWaddr 00:13:72:FA:42:ED
> inet addr:10.6.1.254 Bcast:10.6.1.255 Mask:255.255.255.0
> inet6 addr: fe80::213:72ff:fefa:42ed/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:200637 errors:0 dropped:0 overruns:0 frame:0
> TX packets:165336 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:187105568 (178.4 MiB) TX bytes:26263945 (25.0 MiB)
> Interrupt:169 Memory:f8000000-f8011100
>
> eth2 Link encap:Ethernet HWaddr 00:15:17:0E:9E:68
> inet addr:10.6.2.254 Bcast:10.6.2.255 Mask:255.255.255.0
> inet6 addr: fe80::215:17ff:fe0e:9e68/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:20 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:1280 (1.2 KiB) TX bytes:590 (590.0 b)
> Base address:0xdce0 Memory:fc3e0000-fc400000
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:65 errors:0 dropped:0 overruns:0 frame:0
> TX packets:65 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:4376 (4.2 KiB) TX bytes:4376 (4.2 KiB)
>