Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-01-04 07:50:41


On Dec 30, 2005, at 4:15 AM, Graziano Giuliani wrote:

> #0 0xb7ca2599 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c:
> 716
> 716 if (mca_pls_rsh_component.debug) {
>
> which means we have a memory corruption somewhere else...

Agreed.

> Investigating from outside on what may cause the problem, I have
> found that I
> can make the job run also changing the hostname in my hostfile.
>
> -) No localhost in hostfile -> run
> -) "wowbagger" or "localhost" in hostfile -> run
> -) FQDN wowbagger.cluster in hostfile -> SIGSEGV

LOL -- I did a double take there because one of our machines is named
wowbagger; I had a horrid moment where I was wondering if that name
somehow accidentally got hard-coded in the OMPI code base. :-)

Ok, I think that I am able to reproduce this -- got to love these
Heisenbugs. :-(

Let me see what I can dig up...

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/