Jeff/All,
OK ... so if I follow your lead and build a version without PBS --tm= integration
and it works, I should be able to report this as an incompatibility bug between
the latest version of PBS Pro (10.2.0.93147) and the latest version of OpenMPI
(1.4.2). right? Do I report that you to my friends at OpenMPI or my friends at
PBS Pro (Altair), or both?
Thanks for your help. I will let you know what the result is ...
rbw
Richard Walsh
Parallel Applications and Systems Manager
CUNY HPC Center, Staten Island, NY
718-982-3319
612-382-4620
Mighty the Wizard
Who found me at sunrise
Sleeping, and woke me
And learn'd me Magic!
________________________________________
From: users-bounces_at_[hidden] [users-bounces_at_[hidden]] On Behalf Of Jeff Squyres [jsquyres_at_[hidden]]
Sent: Thursday, June 10, 2010 11:52 AM
To: Open MPI Users
Subject: Re: [OMPI users] Address not mapped segmentation fault with1.4.2 ...
Not offhand, but just to close the loop on a question from your first mail: this should not be a memory manager issue (i.e., not related to IB).
As Ralph noted, this is a segv in the launcher (mpirun, in this case) -- in the tm_init() function call (TM is the launcher helper library in PBS/Torque). Open MPI (mpirun, in this case) calls tm_init() to setup the PBS launcher -- it's the first PBS-specific function call that we make. If tm_init() fails, it may indicate that something fairly basic is busted in that support library.
On Jun 10, 2010, at 11:12 AM, Richard Walsh wrote:
>
> Ralph/Jeff,
>
> Yes, the change was intentional. I have upgraded PBS as well and built
> 1.4.2 pointing to the new PBS via a symbolic link to 'default' which allows one
> to control the actual default without changing the path. I did the same thing
> on the non-IB system which seems to be working fine with 1.4.2. This would
> suggest that this is not the issue.
>
> It is possible that the PBS build in the IB system was flawed, but it looked
> normal. I could rebuild it. The PBS libraries (as well as MPI) are in a shared
> location that is NFS mounted on the compute nodes so things should be in
> sync, but I will verify this.
>
> Any other suggestions ... ??
>
> rbw
>
>
> Richard Walsh
> Parallel Applications and Systems Manager
> CUNY HPC Center, Staten Island, NY
> 718-982-3319
> 612-382-4620
>
> Mighty the Wizard
> Who found me at sunrise
> Sleeping, and woke me
> And learn'd me Magic!
> ________________________________________
> From: users-bounces_at_[hidden] [users-bounces_at_[hidden]] On Behalf Of Jeff Squyres [jsquyres_at_[hidden]]
> Sent: Thursday, June 10, 2010 11:00 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Address not mapped segmentation fault with 1.4.2 ...
>
> On Jun 10, 2010, at 10:57 AM, Ralph Castain wrote:
>
> > That error would indicate something wrong with the pbs connection - it is tm_init that is crashing. I note that you did --with-tm pointing to a different location - was that intentional? Could be something wrong with that pbs build
>
> ...and make sure that the support libs for TM/PBS are the same between the node you're building on and all the nodes where OMPI will be running.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Think green before you print this email.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
Think green before you print this email.
|