Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ...
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-10 11:52:32


Not offhand, but just to close the loop on a question from your first mail: this should not be a memory manager issue (i.e., not related to IB).

As Ralph noted, this is a segv in the launcher (mpirun, in this case) -- in the tm_init() function call (TM is the launcher helper library in PBS/Torque). Open MPI (mpirun, in this case) calls tm_init() to setup the PBS launcher -- it's the first PBS-specific function call that we make. If tm_init() fails, it may indicate that something fairly basic is busted in that support library.

On Jun 10, 2010, at 11:12 AM, Richard Walsh wrote:

>
> Ralph/Jeff,
>
> Yes, the change was intentional. I have upgraded PBS as well and built
> 1.4.2 pointing to the new PBS via a symbolic link to 'default' which allows one
> to control the actual default without changing the path. I did the same thing
> on the non-IB system which seems to be working fine with 1.4.2. This would
> suggest that this is not the issue.
>
> It is possible that the PBS build in the IB system was flawed, but it looked
> normal. I could rebuild it. The PBS libraries (as well as MPI) are in a shared
> location that is NFS mounted on the compute nodes so things should be in
> sync, but I will verify this.
>
> Any other suggestions ... ??
>
> rbw
>
>
> Richard Walsh
> Parallel Applications and Systems Manager
> CUNY HPC Center, Staten Island, NY
> 718-982-3319
> 612-382-4620
>
> Mighty the Wizard
> Who found me at sunrise
> Sleeping, and woke me
> And learn'd me Magic!
> ________________________________________
> From: users-bounces_at_[hidden] [users-bounces_at_[hidden]] On Behalf Of Jeff Squyres [jsquyres_at_[hidden]]
> Sent: Thursday, June 10, 2010 11:00 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 ...
>
> On Jun 10, 2010, at 10:57 AM, Ralph Castain wrote:
>
> > That error would indicate something wrong with the pbs connection - it is tm_init that is crashing. I note that you did --with-tm pointing to a different location - was that intentional? Could be something wrong with that pbs build
>
> ...and make sure that the support libs for TM/PBS are the same between the node you're building on and all the nodes where OMPI will be running.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Think green before you print this email.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/