OK ... so if I follow your lead and build a version without PBS --tm= integration
and it works, I should be able to report this as an incompatibility bug between
the latest version of PBS Pro (10.2.0.93147) and the latest version of OpenMPI
(1.4.2). right? Do I report that you to my friends at OpenMPI or my friends at
PBS Pro (Altair), or both?
Thanks for your help. I will let you know what the result is ...
Parallel Applications and Systems Manager
CUNY HPC Center, Staten Island, NY
Mighty the Wizard
Who found me at sunrise
Sleeping, and woke me
And learn'd me Magic!
From: users-bounces_at_[hidden] [users-bounces_at_[hidden]] On Behalf Of Jeff Squyres [jsquyres_at_[hidden]]
Sent: Thursday, June 10, 2010 11:52 AM
To: Open MPI Users
Subject: Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ...
Not offhand, but just to close the loop on a question from your first mail: this should not be a memory manager issue (i.e., not related to IB).
As Ralph noted, this is a segv in the launcher (mpirun, in this case) -- in the tm_init() function call (TM is the launcher helper library in PBS/Torque). Open MPI (mpirun, in this case) calls tm_init() to setup the PBS launcher -- it's the first PBS-specific function call that we make. If tm_init() fails, it may indicate that something fairly basic is busted in that support library.
On Jun 10, 2010, at 11:12 AM, Richard Walsh wrote:
> Yes, the change was intentional. I have upgraded PBS as well and built
> 1.4.2 pointing to the new PBS via a symbolic link to 'default' which allows one
> to control the actual default without changing the path. I did the same thing
> on the non-IB system which seems to be working fine with 1.4.2. This would
> suggest that this is not the issue.
> It is possible that the PBS build in the IB system was flawed, but it looked
> normal. I could rebuild it. The PBS libraries (as well as MPI) are in a shared
> location that is NFS mounted on the compute nodes so things should be in
> sync, but I will verify this.
> Any other suggestions ... ??
> Richard Walsh
> Parallel Applications and Systems Manager
> CUNY HPC Center, Staten Island, NY
> Mighty the Wizard
> Who found me at sunrise
> Sleeping, and woke me
> And learn'd me Magic!
> From: users-bounces_at_[hidden] [users-bounces_at_[hidden]] On Behalf Of Jeff Squyres [jsquyres_at_[hidden]]
> Sent: Thursday, June 10, 2010 11:00 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 ...
> On Jun 10, 2010, at 10:57 AM, Ralph Castain wrote:
> > That error would indicate something wrong with the pbs connection - it is tm_init that is crashing. I note that you did --with-tm pointing to a different location - was that intentional? Could be something wrong with that pbs build
> ...and make sure that the support libs for TM/PBS are the same between the node you're building on and all the nodes where OMPI will be running.
> Jeff Squyres
> For corporate legal information go to:
> users mailing list
> Think green before you print this email.
> users mailing list
For corporate legal information go to:
users mailing list
Think green before you print this email.