From: Pierre Valiron (Pierre.Valiron_at_[hidden])
Date: 2006-03-08 04:46:18

Sorry for the interruption. I back on mpi tracks again.

I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged.

I have also discovered that I don't need to run any openmpi application
to show up the error.

mpirun --help or mpirun show up the same error:
valiron_at_icare ~ > mpirun
*Segmentation fault (core dumped)

valiron_at_icare ~ > pstack core
core 'core' of 13842: mpirun
 fffffd7ffee9dfe0 strlen () + 20
 fffffd7ffeef6ab3 vsprintf () + 33
 fffffd7fff180fd1 opal_vasprintf () + 41
 fffffd7fff180f88 opal_asprintf () + 98
 00000000004098a3 orterun () + 63
 0000000000407214 main () + 34
 000000000040708c ???????? ()

Seems very basic !

Using dbx produces a little more info, unfortunately cryptic for me:

valiron_at_icare ~ > dbx /users/valiron/lib/openmpi-1.0.2a9/bin/mpirun
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in
your .dbxrc
Reading mpirun
(dbx) run
Running: mpirun
(process id 13881)
t_at_1 (l_at_1) signal SEGV (no mapping at the fault address) in strlen at
0xfffffd7ffee9dfe0: strlen+0x0020: cmpb $0x0000000000000000,(%rsi)
Current function is opal_vasprintf (optimized)
  206 length = vsprintf(*ptr, fmt, ap);

For information I copied the man page for vsprintf()

Any idea ?

Of course I would be glad to provide an account to the machine (but for
security reasons not on the list...).


Brian Barrett wrote:
> On Feb 27, 2006, at 8:50 AM, Pierre Valiron wrote:
>> - Make completed nicely, excepted compiling ompi/mpi/f90/mpi.f90
>> which took nearly half an hour to complete. I suspect the
>> optimization flags in FFLAGS are not important for applications,
>> and I could use -O0 or -O1 instead.
> You probably won't see any performance impact at all if you compile
> the Fortran 90 layer of Open MPI with no optimizations. It's a very
> thin wrapper and the compiler isn't going to be able to do much with
> it anyway. One other thing - if you know your F90 code never sends
> arrays greater than dimension X (X defaults to 4), you can speed
> things up immensly by configuring Open MPI with the option --with-f90-
> max-array-dim=X.
>> - However the resulting executable fails to launch:
>> valiron_at_icare ~/config > mpirun --prefix /users/valiron/lib/
>> openmpi-1.0.2a9 -np 2 a.out
>> Segmentation fault (core dumped)
>> - The problem seems buried into open-mpi:
>> valiron_at_icare ~/config > pstack core
>> core 'core' of 27996: mpirun --prefix /users/valiron/lib/
>> openmpi-1.0.2a9 -np 2 a.out
>> fffffd7fff05dfe0 strlen () + 20
>> fffffd7fff0b6ab3 vsprintf () + 33
>> fffffd7fff2e4211 opal_vasprintf () + 41
>> fffffd7fff2e41c8 opal_asprintf () + 98
>> 00000000004098a3 orterun () + 63
>> 0000000000407214 main () + 34
>> 000000000040708c ???????? ()
> Ugh... Yes, we're probably doing something wrong there.
> Unfortunately, neither Jeff nor I have access to an Opteron box
> running Solaris and I can't replicate the problem on either a
> UltraSparc running Solaris or an Opteron running Linux. Could you
> compile Open MPI with CFLAGS set to "-g -O -xtarget=opteron -
> xarch=amd64". Hopefully being able to see the callstack with some
> line numbers will help a bit.
> Brian

