Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] seg fault in openmpi-1.3.1 when shell in passwd is empty
From: Sergey E. Koposov (math_at_[hidden])
Date: 2009-03-24 01:37:55


Hi All,

I've found that openmpi-1.3.1 segfaults when the the shell field in the
passwd file is empty.

So I take the simple program which does nothing:
--------------------------------------
#include <stdio.h>
#include "mpi.h"
main (int argc, char **argv) {
int nworkers, whoami, i, errcode;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &whoami);
MPI_Comm_size(MPI_COMM_WORLD, &nworkers);
printf("%d %d ",whoami, nworkers);
MPI_Finalize();
}
----------------------------------
Compile it. And run it.
I get the segfault:
----------------------------------
[fortune:05346] *** Process received signal ***
[fortune:05346] Signal: Segmentation fault (11)
[fortune:05346] Signal code: Address not mapped (1)
[fortune:05346] Failing at address: 0x1
[fortune:05346] [ 0] [0xffffe40c]
[fortune:05346] [ 1] /usr/lib/openmpi/mca_plm_rsh.so [0xb7f9baa1]
[fortune:05346] [ 2] /usr/lib/openmpi/mca_plm_rsh.so [0xb7f9d291]
[fortune:05346] [ 3] /usr/bin/mpirun [0x804a8cb]
[fortune:05346] [ 4] /usr/bin/mpirun [0x8049ff2]
[fortune:05346] [ 5] /lib/libc.so.6(__libc_start_main+0xe0) [0xb7d56390]
[fortune:05346] [ 6] /usr/bin/mpirun [0x8049f71]
[fortune:05346] *** End of error message ***
--------------------------
Here is the gdb backtrace:
------------------------------
0xb7dc08c1 in strcmp () from /lib/libc.so.6
(gdb) bt
#0 0xb7dc08c1 in strcmp () from /lib/libc.so.6
#1 0xb7f0ecc9 in find_shell (shell=0x8074b95 "") at plm_rsh_module.c:1459
#2 0xb7f0ce8b in setup_launch (argcptr=0xbfce5960, argvptr=0xbfce5968,
     nodename=0x80795c0 "fortune", node_name_index1=0xbfce5970,
     proc_vpid_index=0xbfce596c, prefix_dir=0x805b028 "/tmp/openmpi_inst")
     at plm_rsh_module.c:376
#3 0xb7f0e181 in orte_plm_rsh_launch (jdata=0x80539a8)
     at plm_rsh_module.c:1051
#4 0x0804a8eb in orterun (argc=4, argv=0xbfce5b74) at orterun.c:680
#5 0x0804a012 in main (argc=Cannot access memory at address 0x1
) at main.c:13
(gdb)
---------------------------
It is clear that the segfault comes from the fact that the shell field
in getpwuid(getuid()) is empty. (as it is in /etc/passwd too). As far as
I understand the empty shell field in passwd file is perfectly correct and
is an alias for /bin/sh (see man 5 passwd).

So, I guess in that case the setup_launch() function should just have
an additional check for an empty pw_shell. Something like this:
-----------------------------------------------
--- openmpi-1.3.1/orte/mca/plm/rsh/plm_rsh_module.c.orig 2009-03-24 06:22:06.000000000 +0100
+++ openmpi-1.3.1/orte/mca/plm/rsh/plm_rsh_module.c 2009-03-24 06:24:07.000000000 +0100
@@ -372,8 +372,11 @@
          orte_show_help( "help-plm-rsh.txt", "unknown-user", true, (int)getuid() );
          return ORTE_ERR_FATAL;
      } else {
- param = p->pw_shell;
- local_shell = find_shell(p->pw_shell);
+ if (!((p->pw_shell)[0]))
+ param="/bin/sh";
+ else
+ param = p->pw_shell;
+ local_shell = find_shell(param);
      }
      /* If we didn't find it in getpwuid(), try looking at the $SHELL
       environment variable (see https://svn.open-mpi.org/trac/ompi/ticket/1060)
----------------------

Regards,
         Sergey

*******************************************************************
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math_at_[hidden]