Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-04-10 13:56:54


Did you put this on /etc/basrhc on all nodes in question?

It is usually easier to modify your own personal startup files, such as
$HOME/.bashrc, etc. See the OMPI FAQ if you need help picking the right
shell startup file for your environment.

You might want to modify your shell startup files and then try running
your sample core-dumper program on another node via rsh/ssh and see if
you get a corefile. E.g.:

shell$ rsh othernode my_core_dumper_app

If you don't get a corefile, then something isn't right (e.g., you
edited the wrong file, the file isn't being read, the file is exiting
early because it's a non-interactive shell, etc.).

> -----Original Message-----
> From: users-bounces_at_[hidden]
> [mailto:users-bounces_at_[hidden]] On Behalf Of Adams Samuel
> D Contr AFRL/HEDR
> Sent: Monday, April 10, 2006 1:06 PM
> To: 'Open MPI Users'
> Subject: Re: [OMPI users] job running question
>
> I put in /etc/bashrc and opened a new shell, but I still am
> not seeing any
> core files.
>
> Sam Adams
> General Dynamics - Network Systems
> Phone: 210.536.5945
>
> -----Original Message-----
> From: users-bounces_at_[hidden]
> [mailto:users-bounces_at_[hidden]] On
> Behalf Of Pavel Shamis (Pasha)
> Sent: Monday, April 10, 2006 8:56 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] job running question
>
> Mpirun opens separate shell on each machine/node, so the
> "ulimit" will
> not be available in new sheel. I think if you will add "ulimit -c
> unlimited" to you default shell configuration file (~/.bashrc in BASH
> case ant ~/.tcshrc in TCSH/CSH case) you will find your core files :)
>
> Regards,
> Pavel Shamis (Pasha)
>
> Adams Samuel D Contr AFRL/HEDR wrote:
> > I set bash to have unlimited size core files like this:
> >
> > $ ulimit -c unlimited
> >
> > But, it was not dropping core files for some reason when I
> was running
> with
> > mpirun. Just to make sure it would do what I expected, I
> wrote a little C
> > program that was kind of like this
> >
> > int ptr = 4;
> > fprintf(stderr,"bad! %s\n", (char*)ptr);
> >
> > That would give a segmentation fault. It dropped a core
> file like you
> would
> > expect. Am I missing something?
> >
> > Sam Adams
> > General Dynamics - Network Systems
> > Phone: 210.536.5945
> >
> > -----Original Message-----
> > From: users-bounces_at_[hidden]
> [mailto:users-bounces_at_[hidden]] On
> > Behalf Of Jeff Squyres (jsquyres)
> > Sent: Saturday, April 08, 2006 6:25 AM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] job running question
> >
> > Some process is exiting on a segv -- are you getting any corefiles?
> >
> > If not, can you increase your coredumpsize to unlimited?
> This should
> > let you get a corefile; can you send the backtrace from
> that corefile?
> >
> >
> >> -----Original Message-----
> >> From: users-bounces_at_[hidden]
> >> [mailto:users-bounces_at_[hidden]] On Behalf Of Adams Samuel
> >> D Contr AFRL/HEDR
> >> Sent: Friday, April 07, 2006 11:53 AM
> >> To: 'users_at_[hidden]'
> >> Subject: [OMPI users] job running question
> >>
> >> We are trying to build a new cluster running OpenMPI. We
> >> were previous
> >> running LAM-MPI. To run jobs we would do the following:
> >>
> >> $ lamboot lam-host-file
> >> $ mpirun C program
> >>
> >> I am not sure if this works more or less the same way with
> >> ompi. We were
> >> trying to run it like this:
> >>
> >> $ [james.parker_at_Cent01 FORTRAN]$ mpirun --np 2 f_5x5 localhost
> >> mpirun noticed that job rank 1 with PID 0 on node "localhost"
> >> exited on
> >> signal 11.
> >> [Cent01.brooks.afmc.ds.af.mil:16124] ERROR: A daemon on
> node localhost
> >> failed to start as expected.
> >> [Cent01.brooks.afmc.ds.af.mil:16124] ERROR: There may be more
> >> information
> >> available from
> >> [Cent01.brooks.afmc.ds.af.mil:16124] ERROR: the remote shell
> >> (see above).
> >> [Cent01.brooks.afmc.ds.af.mil:16124] The daemon received a
> signal 11.
> >> 1 additional process aborted (not shown)
> >> [james.parker_at_Cent01 FORTRAN]$
> >>
> >> We have ompi installed to /usr/local, and these are our environment
> >> variables:
> >>
> >> [james.parker_at_Cent01 FORTRAN]$ export
> >> declare -x COLORTERM="gnome-terminal"
> >> declare -x
> >> DBUS_SESSION_BUS_ADDRESS="unix:abstract=/tmp/dbus-sfzFctmRFS"
> >> declare -x DESKTOP_SESSION="default"
> >> declare -x DISPLAY=":0.0"
> >> declare -x GDMSESSION="default"
> >> declare -x GNOME_DESKTOP_SESSION_ID="Default"
> >> declare -x GNOME_KEYRING_SOCKET="/tmp/keyring-x8WQ1E/socket"
> >> declare -x
> >> GTK_RC_FILES="/etc/gtk/gtkrc:/home/BROOKS-2K/james.parker/.gtk
> >> rc-1.2-gnome2"
> >> declare -x G_BROKEN_FILENAMES="1"
> >> declare -x HISTSIZE="1000"
> >> declare -x HOME="/home/BROOKS-2K/james.parker"
> >> declare -x HOSTNAME="Cent01"
> >> declare -x INPUTRC="/etc/inputrc"
> >> declare -x KDEDIR="/usr"
> >> declare -x LANG="en_US.UTF-8"
> >> declare -x LD_LIBRARY_PATH="/usr/local/lib:/usr/local/lib/openmpi"
> >> declare -x LESSOPEN="|/usr/bin/lesspipe.sh %s"
> >> declare -x LOGNAME="james.parker"
> >> declare -x
> >> LS_COLORS="no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=
> >> 40;33;01:cd=40
> >> ;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.ex
> >> e=00;32:*.com=
> >> 00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;
> >> 31:*.tgz=00;31
> >> :*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z
> >> =00;31:*.gz=00
> >> ;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31
> >> :*.jpg=00;35:*
> >> .gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.t
> >> if=00;35:"
> >> declare -x MAIL="/var/spool/mail/james.parker"
> >> declare -x
> >> OLDPWD="/home/BROOKS-2K/james.parker/build/SuperLU_DIST_2.0"
> >> declare -x
> >> PATH="/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R
> >> 6/bin:/home/BR
> >> OOKS-2K/james.parker/bin:/usr/local/bin"
> >> declare -x
> >> PERL5LIB="/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-mul
> >> ti:/usr/lib/pe
> >> rl5/site_perl/5.8.5"
> >> declare -x
> >> PWD="/home/BROOKS-2K/james.parker/build/SuperLU_DIST_2.0/FORTRAN"
> >> declare -x
> >> SESSION_MANAGER="local/Cent01.brooks.afmc.ds.af.mil:/tmp/.ICE-
> >> unix/14516"
> >> declare -x SHELL="/bin/bash"
> >> declare -x SHLVL="2"
> >> declare -x SSH_AGENT_PID="14541"
> >> declare -x SSH_ASKPASS="/usr/libexec/openssh/gnome-ssh-askpass"
> >> declare -x SSH_AUTH_SOCK="/tmp/ssh-JUIxl14540/agent.14540"
> >> declare -x TERM="xterm"
> >> declare -x USER="james.parker"
> >> declare -x WINDOWID="35651663"
> >> declare -x XAUTHORITY="/home/BROOKS-2K/james.parker/.Xauthority"
> >> [james.parker_at_Cent01 FORTRAN]$
> >>
> >> Any ideas??
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>