Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.4.5rc2 testing linux/ppc/IBM [SOLVED]
From: Paul H. Hargrove (PHHargrove_at_[hidden])
Date: 2012-01-27 15:18:38


On 1/27/2012 5:24 AM, Jeff Squyres wrote:
> On Jan 27, 2012, at 12:45 AM, Paul H. Hargrove wrote:
>
>> On this cluster, statfs() is returning ENOENT, which is breaking opal_path_nfs().
>> So, these results are with test/opal/util/opal_path_nfs.c "disabled".
> Paul -- can you explain this a little more? There should be logic in there to effectively handle ENOENT's, meaning that if we get a non-ESTALE error, we try again with the directory name. This is repeated until we get to "/" -- so there should definitely be at least one case where statfs() is *not* returning ENOENT.
>
> Is that not happening?
>

I looked a bit deeper and found that the bug is in OMPI, but a simple
one to fix.
I added 2 lines to opal/util/path.c:

--- openmpi-1.4.5rc2-orig/opal/util/path.c 2011-02-04
07:38:16.000000000 -0600
+++ openmpi-1.4.5rc2/opal/util/path.c 2012-01-27 12:46:30.000000000
-0600
@@ -476,6 +476,8 @@
          rc = statvfs (file, &buf);
  #elif defined(linux) || defined (__BSD) || (defined(__APPLE__) &&
defined(__MACH__))
          rc = statfs (file, &buf);
+#else
+ #error "No statvfs or statfs call"
  #endif
      } while (-1 == rc && ESTALE == errno && (0 < --trials));

Can you guess what happens when I "make" now?
There IS no call to statfs, and the ENOENT I saw must have been "left
over" from some earlier libc call.

The problem is that these compilers have not pre-defined "linux".
It does appear that they are defining "__linux" and "__linux__"
(double-underscores).
So, a little change of the preprocessor logic should fix this problem:
    $ sed -pi -e 's/defined\(linux\)/defined\(__linux__\)/;' --
opal/util/path.c
[more compact than the corresponding diffs]

With that change (and without "disabling" opal_path_nfs.c) all 4
compilers are PASSing "make all install check".

Source inspection suggests that the 1.5 branch has the same issue.
I've not inspected the HEAD, but somebody should.

FYI:
I've done a bit of grepping for linux,__linux,__linux__.
My search shows only 2 files checking for definition of "linux"
    opal/util/path.c
    opal/mca/memory/ptmalloc2/malloc.c
And exactly one looking for "__linux":
    test/event/event-test.c
Checks for "__linux__" appear in the following files:
    ompi/mca/io/romio/romio/adio/ad_lustre/ad_lustre.h
    ompi/mca/btl/openib/btl_openib_component.c
    opal/util/if.c
    opal/mca/memory/ptmalloc2/arena.c
    test/util/opal_path_nfs.c (IRONY!)
I suggest standardization to "__linux__" in the 3 files that currently
use "linux" or "__linux".

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900