Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Pierre Valiron (Pierre.Valiron_at_[hidden])
Date: 2006-03-08 04:46:18

Sorry for the interruption. I back on mpi tracks again.

I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged.

I have also discovered that I don't need to run any openmpi application
to show up the error.

mpirun --help or mpirun show up the same error:
valiron_at_icare ~ > mpirun
*Segmentation fault (core dumped)

valiron_at_icare ~ > pstack core
core 'core' of 13842: mpirun
 fffffd7ffee9dfe0 strlen () + 20
 fffffd7ffeef6ab3 vsprintf () + 33
 fffffd7fff180fd1 opal_vasprintf () + 41
 fffffd7fff180f88 opal_asprintf () + 98
 00000000004098a3 orterun () + 63
 0000000000407214 main () + 34
 000000000040708c ???????? ()

Seems very basic !

Using dbx produces a little more info, unfortunately cryptic for me:

valiron_at_icare ~ > dbx /users/valiron/lib/openmpi-1.0.2a9/bin/mpirun
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in
your .dbxrc
Reading mpirun
(dbx) run
Running: mpirun
(process id 13881)
t_at_1 (l_at_1) signal SEGV (no mapping at the fault address) in strlen at
0xfffffd7ffee9dfe0: strlen+0x0020: cmpb $0x0000000000000000,(%rsi)
Current function is opal_vasprintf (optimized)
  206 length = vsprintf(*ptr, fmt, ap);

For information I copied the man page for vsprintf()

Standard C Library Functions vprintf(3C)

     vprintf, vfprintf, vsprintf, vsnprintf - print formatted
     output of a variable argument list

     #include <stdio.h>
     #include <stdarg.h>

     int vprintf(const char *format, va_list ap);

     int vfprintf(FILE *stream, const char *format, va_list ap);

     int vsprintf(char *s, const char *format, va_list ap);

     int vsnprintf(char *s, size_t n, const char *format, va_list

     The vprintf(), vfprintf(), vsprintf() and vsnprintf() func-
     tions are the same as printf(), fprintf(), sprintf(), and
     snprintf(), respectively, except that instead of being
     called with a variable number of arguments, they are called
     with an argument list as defined in the <stdarg.h> header.
     See printf(3C).

     The <stdarg.h> header defines the type va_list and a set of
     macros for advancing through a list of arguments whose
     number and types may vary. The argument ap to the vprint
     family of functions is of type va_list. This argument is
     used with the <stdarg.h> header file macros va_start(),
     va_arg(), and va_end() (see stdarg(3EXT)). The EXAMPLES
     section below demonstrates the use of va_start() and
     va_end() with vprintf().

     The macro va_alist() is used as the parameter list in a
     function definition, as in the function called error() in
     the example below. The macro va_start(ap, parmN), where ap
     is of type va_list and parmN is the rightmost parameter
     (just before ...), must be called before any attempt to
     traverse and access unnamed arguments is made. The
     va_end(ap) macro must be invoked when all desired arguments
     have been accessed. The argument list in ap can be traversed
     again if va_start() is called again after va_end(). In the
     example below, the error() arguments (arg1, arg2, ...) are
     passed to vfprintf() in the argument ap.

     Refer to printf(3C).

     The vprintf() and vfprintf() functions will fail if either
     the stream is unbuffered or the stream's buffer needed to be
     flushed and:

     EFBIG The file is a regular file and an attempt
                     was made to write at or beyond the offset

Any idea ?

Of course I would be glad to provide an account to the machine (but for
security reasons not on the list...).


Brian Barrett wrote:
> On Feb 27, 2006, at 8:50 AM, Pierre Valiron wrote:
>> - Make completed nicely, excepted compiling ompi/mpi/f90/mpi.f90
>> which took nearly half an hour to complete. I suspect the
>> optimization flags in FFLAGS are not important for applications,
>> and I could use -O0 or -O1 instead.
> You probably won't see any performance impact at all if you compile
> the Fortran 90 layer of Open MPI with no optimizations. It's a very
> thin wrapper and the compiler isn't going to be able to do much with
> it anyway. One other thing - if you know your F90 code never sends
> arrays greater than dimension X (X defaults to 4), you can speed
> things up immensly by configuring Open MPI with the option --with-f90-
> max-array-dim=X.
>> - However the resulting executable fails to launch:
>> valiron_at_icare ~/config > mpirun --prefix /users/valiron/lib/
>> openmpi-1.0.2a9 -np 2 a.out
>> Segmentation fault (core dumped)
>> - The problem seems buried into open-mpi:
>> valiron_at_icare ~/config > pstack core
>> core 'core' of 27996: mpirun --prefix /users/valiron/lib/
>> openmpi-1.0.2a9 -np 2 a.out
>> fffffd7fff05dfe0 strlen () + 20
>> fffffd7fff0b6ab3 vsprintf () + 33
>> fffffd7fff2e4211 opal_vasprintf () + 41
>> fffffd7fff2e41c8 opal_asprintf () + 98
>> 00000000004098a3 orterun () + 63
>> 0000000000407214 main () + 34
>> 000000000040708c ???????? ()
> Ugh... Yes, we're probably doing something wrong there.
> Unfortunately, neither Jeff nor I have access to an Opteron box
> running Solaris and I can't replicate the problem on either a
> UltraSparc running Solaris or an Opteron running Linux. Could you
> compile Open MPI with CFLAGS set to "-g -O -xtarget=opteron -
> xarch=amd64". Hopefully being able to see the callstack with some
> line numbers will help a bit.
> Brian

Soutenez le mouvement SAUVONS LA RECHERCHE :
       _/_/_/_/    _/       _/       Dr. Pierre VALIRON
      _/     _/   _/      _/   Laboratoire d'Astrophysique
     _/     _/   _/     _/    Observatoire de Grenoble / UJF
    _/_/_/_/    _/    _/    BP 53  F-38041 Grenoble Cedex 9 (France)
   _/          _/   _/
  _/          _/  _/     Mail: Pierre.Valiron_at_[hidden]
 _/          _/ _/      Phone: +33 4 7651 4787  Fax: +33 4 7644 8821
_/          _/_/