Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] seg fault in openmpi-1.3.1 when shell in passwd is empty
From: Sergey E. Koposov (math_at_[hidden])
Date: 2009-03-24 01:37:55


Hi All,

I've found that openmpi-1.3.1 segfaults when the the shell field in the
passwd file is empty.

So I take the simple program which does nothing:
--------------------------------------
#include <stdio.h>
#include "mpi.h"
main (int argc, char **argv) {
int nworkers, whoami, i, errcode;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &whoami);
MPI_Comm_size(MPI_COMM_WORLD, &nworkers);
printf("%d %d ",whoami, nworkers);
MPI_Finalize();
}
----------------------------------
Compile it. And run it.
I get the segfault:
----------------------------------
[fortune:05346] *** Process received signal ***
[fortune:05346] Signal: Segmentation fault (11)
[fortune:05346] Signal code: Address not mapped (1)
[fortune:05346] Failing at address: 0x1
[fortune:05346] [ 0] [0xffffe40c]
[fortune:05346] [ 1] /usr/lib/openmpi/mca_plm_rsh.so [0xb7f9baa1]
[fortune:05346] [ 2] /usr/lib/openmpi/mca_plm_rsh.so [0xb7f9d291]
[fortune:05346] [ 3] /usr/bin/mpirun [0x804a8cb]
[fortune:05346] [ 4] /usr/bin/mpirun [0x8049ff2]
[fortune:05346] [ 5] /lib/libc.so.6(__libc_start_main+0xe0) [0xb7d56390]
[fortune:05346] [ 6] /usr/bin/mpirun [0x8049f71]
[fortune:05346] *** End of error message ***
--------------------------
Here is the gdb backtrace:
------------------------------
0xb7dc08c1 in strcmp () from /lib/libc.so.6
(gdb) bt
#0 0xb7dc08c1 in strcmp () from /lib/libc.so.6
#1 0xb7f0ecc9 in find_shell (shell=0x8074b95 "") at plm_rsh_module.c:1459
#2 0xb7f0ce8b in setup_launch (argcptr=0xbfce5960, argvptr=0xbfce5968,
     nodename=0x80795c0 "fortune", node_name_index1=0xbfce5970,
     proc_vpid_index=0xbfce596c, prefix_dir=0x805b028 "/tmp/openmpi_inst")
     at plm_rsh_module.c:376
#3 0xb7f0e181 in orte_plm_rsh_launch (jdata=0x80539a8)
     at plm_rsh_module.c:1051
#4 0x0804a8eb in orterun (argc=4, argv=0xbfce5b74) at orterun.c:680
#5 0x0804a012 in main (argc=Cannot access memory at address 0x1
) at main.c:13
(gdb)
---------------------------
It is clear that the segfault comes from the fact that the shell field
in getpwuid(getuid()) is empty. (as it is in /etc/passwd too). As far as
I understand the empty shell field in passwd file is perfectly correct and
is an alias for /bin/sh (see man 5 passwd).

So, I guess in that case the setup_launch() function should just have
an additional check for an empty pw_shell. Something like this:
-----------------------------------------------
--- openmpi-1.3.1/orte/mca/plm/rsh/plm_rsh_module.c.orig 2009-03-24 06:22:06.000000000 +0100
+++ openmpi-1.3.1/orte/mca/plm/rsh/plm_rsh_module.c 2009-03-24 06:24:07.000000000 +0100
@@ -372,8 +372,11 @@
          orte_show_help( "help-plm-rsh.txt", "unknown-user", true, (int)getuid() );
          return ORTE_ERR_FATAL;
      } else {
- param = p->pw_shell;
- local_shell = find_shell(p->pw_shell);
+ if (!((p->pw_shell)[0]))
+ param="/bin/sh";
+ else
+ param = p->pw_shell;
+ local_shell = find_shell(param);
      }
      /* If we didn't find it in getpwuid(), try looking at the $SHELL
       environment variable (see https://svn.open-mpi.org/trac/ompi/ticket/1060)
----------------------

Regards,
         Sergey

*******************************************************************
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math_at_[hidden]