Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Troubles with MPI-IO Test and Torque/PVFS
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-04-13 08:27:11


It looks like you're seg faulting when calling some flavor printf
(perhaps vsnprintf?) in the make_error_messages() function.

You might want to double check the read_write_file() function to see
exactly what kind of error it is encountering such that it is calling
report_errs().

On Apr 10, 2008, at 3:29 PM, Davi Vercillo C. Garcia wrote:
> Hi all,
>
> I have a Cluster with Torque and PVFS. I'm trying to test my
> environment with MPI-IO Test but some segfault are occurring.
> Does anyone know what is happening ? The error output is below:
>
> Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad
> bytes at file offset 0. Expected (null), received (null)
> Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad
> bytes at file offset 0. Expected (null), received (null)
> [campogrande01:10646] *** Process received signal ***
> Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad
> bytes at file offset 0. Expected (null), received (null)
> Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 65537
> bad bytes at file offset 0. Expected (null), received (null)
> [campogrande04:05192] *** Process received signal ***
> [campogrande04:05192] Signal: Segmentation fault (11)
> [campogrande04:05192] Signal code: Address not mapped (1)
> [campogrande04:05192] Failing at address: 0x10000
> Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 65537
> bad bytes at file offset 0. Expected (null), received (null)
> [campogrande03:05377] *** Process received signal ***
> [campogrande03:05377] Signal: Segmentation fault (11)
> [campogrande03:05377] Signal code: Address not mapped (1)
> [campogrande03:05377] Failing at address: 0x10000
> [campogrande03:05377] [ 0] [0xffffe440]
> [campogrande03:05377] [ 1]
> /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
> [campogrande03:05377] [ 2] mpiIO_test(make_error_messages+0xcf)
> [0x80502e4]
> [campogrande03:05377] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
> [campogrande03:05377] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
> [campogrande03:05377] [ 5] mpiIO_test(read_write_file+0x594)
> [0x804d9c2]
> [campogrande03:05377] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
> [campogrande03:05377] [ 7]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
> [campogrande03:05377] [ 8] mpiIO_test [0x804a7e1]
> [campogrande03:05377] *** End of error message ***
> Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 65537
> bad bytes at file offset 0. Expected (null), received (null)
> [campogrande02:05187] *** Process received signal ***
> [campogrande02:05187] Signal: Segmentation fault (11)
> [campogrande02:05187] Signal code: Address not mapped (1)
> [campogrande02:05187] Failing at address: 0x10000
> [campogrande01:10646] Signal: Segmentation fault (11)
> [campogrande01:10646] Signal code: Address not mapped (1)
> [campogrande01:10646] Failing at address: 0x1a0000
> [campogrande02:05187] [ 0] [0xffffe440]
> [campogrande02:05187] [ 1]
> /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
> [campogrande02:05187] [ 2] mpiIO_test(make_error_messages+0xcf)
> [0x80502e4]
> [campogrande02:05187] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
> [campogrande02:05187] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
> [campogrande02:05187] [ 5] mpiIO_test(read_write_file+0x594)
> [0x804d9c2]
> [campogrande02:05187] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
> [campogrande02:05187] [ 7]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
> [campogrande02:05187] [ 8] mpiIO_test [0x804a7e1]
> [campogrande02:05187] *** End of error message ***
> [campogrande04:05192] [ 0] [0xffffe440]
> [campogrande04:05192] [ 1]
> /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
> [campogrande04:05192] [ 2] mpiIO_test(make_error_messages+0xcf)
> [0x80502e4]
> [campogrande04:05192] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
> [campogrande04:05192] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
> [campogrande04:05192] [ 5] mpiIO_test(read_write_file+0x594)
> [0x804d9c2]
> [campogrande04:05192] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
> [campogrande04:05192] [ 7]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
> [campogrande04:05192] [ 8] mpiIO_test [0x804a7e1]
> [campogrande04:05192] *** End of error message ***
> [campogrande01:10646] [ 0] [0xffffe440]
> [campogrande01:10646] [ 1]
> /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4]
> [campogrande01:10646] [ 2] mpiIO_test(make_error_messages+0xcf)
> [0x80502e4]
> [campogrande01:10646] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569]
> [campogrande01:10646] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413]
> [campogrande01:10646] [ 5] mpiIO_test(read_write_file+0x594)
> [0x804d9c2]
> [campogrande01:10646] [ 6] mpiIO_test(main+0x1d0) [0x804aa14]
> [campogrande01:10646] [ 7]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050]
> [campogrande01:10646] [ 8] mpiIO_test [0x804a7e1]
> [campogrande01:10646] *** End of error message ***
> mpiexec noticed that job rank 0 with PID 5192 on node campogrande04
> exited on signal 11 (Segmentation fault).
>
> --
> Davi Vercillo Carneiro Garcia
>
> Universidade Federal do Rio de Janeiro
> Departamento de Ciência da Computação
> DCC-IM/UFRJ - http://www.dcc.ufrj.br
>
> "Good things come to those who... wait." - Debian Project
>
> "A computer is like air conditioning: it becomes useless when you open
> windows." - Linus Torvalds
>
> "Há duas coisas infinitas, o universo e a burrice humana. E eu estou
> em dúvida quanto o primeiro." - Albert Einstein
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems