Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Working directory isn't set properly on Linux cluster
From: Todd Gamblin (tgamblin_at_[hidden])
Date: 2008-06-23 09:52:15


Thanks for pointing this out (I'm not sure how I got that wrong in the
test) -- making the test program do the right thing gives:

> (merle):test$ mpirun -np 4 test
> before MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin/test
> before MPI_Init:
> PWD: /home/tgamblin
> getcwd: /home/tgamblin/test
>
> etc...

-Todd

On Jun 23, 2008, at 5:03 AM, Jeff Squyres wrote:

> I think the issue here is that your test app is checking $PWD, not
> getcwd().
>
> If you call getcwd(), you'll get the right answer (see my tests
> below). But your point is noted that perhaps OMPI should be setting
> PWD to the correct value before launching the user app.
>
> [5:01] svbu-mpi:~/tmp % salloc -N 1 tcsh
> salloc: Granted job allocation 5311
> [5:01] svbu-mpi:~/tmp % mpirun -np 1 pwd
> /home/jsquyres/tmp
> [5:01] svbu-mpi:~/tmp % mpirun -np 1 -wdir ~/mpi pwd
> /home/jsquyres/mpi
> [5:01] svbu-mpi:~/tmp % cat foo.c
> #include <stdio.h>
> #include <unistd.h>
>
> int main() {
> char buf[BUFSIZ];
>
> getcwd(buf, BUFSIZ);
> printf("CWD is %s\n", buf);
> return 0;
> }
> [5:01] svbu-mpi:~/tmp % gcc foo.c -o foo
> [5:01] svbu-mpi:~/tmp % mpirun -np 1 foo
> CWD is /home/jsquyres/tmp
> [5:01] svbu-mpi:~/tmp % mpirun -np 1 -wdir ~/mpi ~/tmp/foo
> CWD is /home/jsquyres/mpi
> [5:01] svbu-mpi:~/tmp %
>
>
>
> On Jun 22, 2008, at 12:14 AM, Todd Gamblin wrote:
>
>> I'm having trouble getting OpenMPI to set the working directory
>> properly when running jobs on a Linux cluster. I made a test
>> program (at end of post) that recreates the problem pretty well by
>> just printing out the results of getcwd(). Here's output both with
>> and without using -wdir:
>>
>>> (merle):~$ cd test
>>> (merle):test$ mpirun -np 2 test
>>> before MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>> before MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>> after MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>> after MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>> (merle):test$ mpirun -np 2 -wdir /home/tgamblin/test test
>>> before MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>> before MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>> after MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>> after MPI_Init:
>>> PWD: /home/tgamblin
>>> getcwd: /home/tgamblin
>>
>>
>> Shouldn't these print out /home/tgamblin/test? Also, this is even
>> stranger:
>>
>>> (merle):test$ mpirun -np 2 pwd
>>> /home/tgamblin/test
>>> /home/tgamblin/test
>>
>>
>> I feel like my program should output the same thing as pwd.
>>
>> I'm using OpenMPI 1.2.6, and the cluster has 8 nodes, with 2-by
>> dual-core woodcrests each (total 32 cores). There are 2 tcp
>> networks on this cluster, one that the head node uses to talk to
>> the compute nodes and one (Gigabit) network that the compute nodes
>> can reach each other (but not the head node) on. I have
>> "btl_tcp_if_include = eth2" in my mca params file to keep the
>> compute nodes using the fast interconnect to talk to each other,
>> and I've pasted ifconfig output for the head node and for one
>> compute node below. Also, if it helps, the home directories on
>> this machine are mounted via autofs.
>>
>> This is causing problems b/c I'm using apps that look for the
>> config file in the working directory. Please let me know if you
>> guys have any idea what's going on.
>>
>> Thanks!
>> -Todd
>>
>>
>> TEST PROGRAM:
>>> #include "mpi.h"
>>> #include <cstdlib>
>>> #include <iostream>
>>> #include <sstream>
>>> using namespace std;
>>>
>>> void testdir(const char*where) {
>>> char buf[1024];
>>> getcwd(buf, 1024);
>>>
>>> ostringstream tmp;
>>> tmp << where << ":" << endl
>>> << "\tPWD:\t"<< getenv("PWD") << endl
>>> << "\tgetcwd:\t"<< getenv("PWD") << endl;
>>> cout << tmp.str();
>>> }
>>>
>>> int main(int argc, char **argv) {
>>> testdir("before MPI_Init");
>>> MPI_Init(&argc, &argv);
>>> testdir("after MPI_Init");
>>> MPI_Finalize();
>>> }
>>
>> HEAD NODE IFCONFIG:
>>> eth0 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:90
>>> inet addr:10.6.1.1 Bcast:10.6.1.255 Mask:255.255.255.0
>>> inet6 addr: fe80::218:8bff:fe2f:3d90/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:1579250319 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:874273636 errors:0 dropped:0 overruns:0
>>> carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:2361367146846 (2.1 TiB) TX bytes:85373933521
>>> (79.5 GiB)
>>> Interrupt:169 Memory:f4000000-f4011100
>>>
>>> eth0:1 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:90
>>> inet addr:10.6.2.1 Bcast:10.6.2.255 Mask:255.255.255.0
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> Interrupt:169 Memory:f4000000-f4011100
>>>
>>> eth1 Link encap:Ethernet HWaddr 00:18:8B:2F:3D:8E
>>> inet addr:152.54.1.21 Bcast:152.54.3.255 Mask:
>>> 255.255.252.0
>>> inet6 addr: fe80::218:8bff:fe2f:3d8e/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:14436523 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:7357596 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:2354451258 (2.1 GiB) TX bytes:2218390772 (2.0
>>> GiB)
>>> Interrupt:169 Memory:f8000000-f8011100
>>>
>>> lo Link encap:Local Loopback
>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>> inet6 addr: ::1/128 Scope:Host
>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>> RX packets:540889623 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:540889623 errors:0 dropped:0 overruns:0
>>> carrier:0
>>> collisions:0 txqueuelen:0
>>> RX bytes:63787539844 (59.4 GiB) TX bytes:63787539844
>>> (59.4 GiB)
>>>
>>
>> COMPUTE NODE IFCONFIG:
>>> (compute-0-0):~$ ifconfig
>>> eth0 Link encap:Ethernet HWaddr 00:13:72:FA:42:ED
>>> inet addr:10.6.1.254 Bcast:10.6.1.255 Mask:255.255.255.0
>>> inet6 addr: fe80::213:72ff:fefa:42ed/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:200637 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:165336 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:187105568 (178.4 MiB) TX bytes:26263945 (25.0
>>> MiB)
>>> Interrupt:169 Memory:f8000000-f8011100
>>>
>>> eth2 Link encap:Ethernet HWaddr 00:15:17:0E:9E:68
>>> inet addr:10.6.2.254 Bcast:10.6.2.255 Mask:255.255.255.0
>>> inet6 addr: fe80::215:17ff:fe0e:9e68/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:20 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:1280 (1.2 KiB) TX bytes:590 (590.0 b)
>>> Base address:0xdce0 Memory:fc3e0000-fc400000
>>>
>>> lo Link encap:Local Loopback
>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>> inet6 addr: ::1/128 Scope:Host
>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>> RX packets:65 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:65 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:0
>>> RX bytes:4376 (4.2 KiB) TX bytes:4376 (4.2 KiB)
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users