I am a developer on the Darshan project
(http://www.mcs.anl.gov/darshan), which provides a set of lightweight
wrappers to characterize the I/O access patterns of MPI applications.
Darshan can operate on static or dynamic executables. As you might
expect, it uses the LD_PRELOAD mechanism to intercept I/O calls like
open(), read(), write() and stat() on dynamic executables.
We recently received an unusual bug report (courtesy of Myriam Botalla)
when Darshan is used in LD_PRELOAD mode with Open MPI 1.6.3, however.
When Darshan intercepts a function call via LD_PRELOAD, it must use
dlsym() to locate the "real" underlying function to invoke. dlsym() in
turn uses the calloc() function internally. In most cases this is fine,
but Open MPI actually makes its first stat() call within the malloc
initialization hook (opal_memory_linux_malloc_init_hook()) before the
malloc() and its related functions have been configured. Darshan
therefore (indirectly) triggers a segfault because it intercepts those
stat() calls but can't find the real stat() function without using malloc.
There is some more detailed information about this issue, including a
stack trace, in this mailing list thread:
Looking a little more closely at the
opal_memory_linux_malloc_init_hook() function, it looks like the struct
stat output argument from stat() is being ignored in all cases. Open
MPI is just checking the stat() return code to determine if the files in
question exist or not. Taking that into account, would it be possible
to make a minor change in Open MPI to replace these instances:
in the opal_memory_linux_malloc_init_hook() function? There is a slight
technical advantage to the change in that access() is lighter weight
than stat() on some systems (and it might arguably make the intent of
the calls a little clearer), but of course my main motivation here is to
have Open MPI use a function that is less likely to be intercepted by
I/O tracing tools before a malloc implementation has been initialized.
Technically it is possible to work around this in Darshan itself by
checking the arguments passed in to stat() and using a workaround path
for this case, but this isn't a very safe solution in the long run.
Thanks in advance for your time and consideration,