Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Seg fault with PBS Pro 10.4
From: Youri LACAN-BARTLEY (youri.lacan-bartley_at_[hidden])
Date: 2011-07-27 05:58:47


Hi,

For what it's worth: we're successfully running OMPI 1.4.3 compiled with gcc-4.1.2 along with PBS Pro 10.4.

Kind regards,

Youri LACAN-BARTLEY

-----Message d'origine-----
De : users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] De la part de Ralph Castain
Envoyé : mercredi 27 juillet 2011 02:49
À : Open MPI Users
Objet : Re: [OMPI users] Seg fault with PBS Pro 10.4

I don't believe we ever got anywhere with this due to lack of response. If you get some info on what happened to tm_init, please pass it along.

Best guess: something changed in a recent PBS Pro release. Since none of us have access to it, we don't know what's going on. :-(

On Jul 26, 2011, at 10:10 AM, Wood, Justin Contractor, SAIC wrote:

> I'm having a problem using OpenMPI under PBS Pro 10.4. I tried both 1.4.3 and 1.5.3, both behave the same. I'm able to run just fine if I don't use PBS and go direct to the nodes. Also, if I run under PBS and use only 1 node, it works fine, but as soon as I span nodes, I get the following:
>
> [a4ou-n501:07366] *** Process received signal ***
> [a4ou-n501:07366] Signal: Segmentation fault (11)
> [a4ou-n501:07366] Signal code: Address not mapped (1)
> [a4ou-n501:07366] Failing at address: 0x3f
> [a4ou-n501:07366] [ 0] /lib64/libpthread.so.0 [0x3f2b20eb10]
> [a4ou-n501:07366] [ 1] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(discui_+0x84) [0x2affa453765c]
> [a4ou-n501:07366] [ 2] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(diswsi+0xc3) [0x2affa4534c6f]
> [a4ou-n501:07366] [ 3] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0 [0x2affa453290c]
> [a4ou-n501:07366] [ 4] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(tm_init+0x1fe) [0x2affa4532bf8]
> [a4ou-n501:07366] [ 5] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0 [0x2affa452691c]
> [a4ou-n501:07366] [ 6] mpirun [0x404c17]
> [a4ou-n501:07366] [ 7] mpirun [0x403e28]
> [a4ou-n501:07366] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3f2a61d994]
> [a4ou-n501:07366] [ 9] mpirun [0x403d59]
> [a4ou-n501:07366] *** End of error message ***
> Segmentation fault
>
> I searched the archives and found a similar issue from last year:
>
> http://www.open-mpi.org/community/lists/users/2010/02/12084.php
>
> The last update I saw was that someone was going to contact Altair and have them look at why it was failing to do the tm_init. Does anyone have an update to this, and has anyone been able to run successfully using recent versions of PBSPro? I've also contacted our rep at Altair, but he hasn't responded yet.
>
> Thanks, Justin.
>
> Justin Wood
> Systems Engineer
> FNMOC | SAIC
> 7 Grace Hopper, Stop 1
> Monterey, CA
> justin.g.wood.ctr_at_[hidden]
> justin.g.wood_at_[hidden]
> office: 831.656.4671
> mobile: 831.869.1576
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users