Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Openmpi 1.6.5 is freezing under GNU/Linux ia64
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-11-16 12:07:50


Sigh - if you just go to line 205 in the indicated file and blow away that print statement, the segfault should end.

However, that won't solve the root problem - you'll just cleanly exit with an error statement. The issue is that we aren't finding ssh or rsh in your PATH. Do you have one or both of those installed?

On Nov 16, 2013, at 2:33 AM, Sylvestre Ledru <sylvestre_at_[hidden]> wrote:

> On 15/11/2013 17:50, Ralph Castain wrote:
>> Hmm...well, that will make debug a tad more difficult. I've attached a patch that *should* stop the segfault. Given that behavior, though, it looks like the system isn't finding either rsh or ssh on your machine. Might be the root cause of the problem.
> With your patch:
> $ ./mpirun -mca plm_base_verbose 5 -mca ras_base_verbose 5 -mcarmaps_base_verbose 5 -mca ess_base_verbose 5 -c 4 foo
> [merulo:08821] mca:base:select:( plm) Querying component [rsh]
> [merulo:08821] [[INVALID],INVALID] plm:base:rsh_lookup on agent ssh : rsh path NULL
> [merulo:08821] *** Process received signal ***
> [merulo:08821] Signal: Segmentation fault (11)
> [merulo:08821] Signal code: Invalid permissions (2)
> [merulo:08821] Failing at address: (nil)
> [merulo:08821] [ 0] linux-gate.so.1(__kernel_sigtramp+0x7fffffffff886860) [0xa000000000040800]
> [merulo:08821] [ 1] /home/sylvestre/bogus2/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_component_query+0xae3b0) [0x2000000000867f30]
> [merulo:08821] [ 2] /home/sylvestre/bogus2/lib/libopen-rte.so.4(mca_base_select-0x5dc110) [0x20000000001ddea0]
> [merulo:08821] [ 3] /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_plm_base_select-0x680cd0) [0x20000000001392f0]
> [merulo:08821] [ 4] /home/sylvestre/bogus2/lib/openmpi/mca_ess_hnp.so(+0x56f0) [0x20000000008316f0]
> [merulo:08821] [ 5] /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_init-0x72bf10) [0x200000000008e0c0]
> [merulo:08821] [ 6] ./mpirun(orterun+0x1fffffffff84cc80) [0x4000000000006c60]
> [merulo:08821] [ 7] ./mpirun(main+0x1fffffffff84b880) [0x40000000000045e0]
> [merulo:08821] [ 8] /lib/ia64-linux-gnu/libc.so.6.1(__libc_start_main-0x2fcd50) [0x20000000004bd2a0]
> [merulo:08821] [ 9] ./mpirun(_start+0x1fffffffff84a3c0) [0x40000000000043c0]
> [merulo:08821] *** End of error message ***
> Segmentation fault
>
> bt:
> Program received signal SIGSEGV, Segmentation fault.
> 0x2000000000867f30 in orte_plm_rsh_component_query (
> module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0)
> at plm_rsh_component.c:205
> 205 OPAL_OUTPUT_VERBOSE((1, orte_plm_globals.output,
> (gdb) bt
> #0 0x2000000000867f30 in orte_plm_rsh_component_query (
> module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0)
> at plm_rsh_component.c:205
> #1 0x20000000001ddea0 in mca_base_select (
> type_name=0x200000000026e708 "plm", output_id=8,
> components_available=0x20000000002c5f08 <orte_plm_base>,
> best_module=0x60000fffffffb0f0, best_component=0x60000fffffffb0f8)
> at mca_base_components_select.c:76
> #2 0x20000000001392f0 in orte_plm_base_select () at base/plm_base_select.c:46
> #3 0x20000000008316f0 in rte_init () at ess_hnp_module.c:169
> #4 0x200000000008e0c0 in orte_init (pargc=0x60000fffffffb370,
> pargv=0x60000fffffffb378, flags=4) at runtime/orte_init.c:127
> #5 0x4000000000006c60 in orterun (argc=15, argv=0x60000fffffffb628)
> at orterun.c:693
> #6 0x40000000000045e0 in main (argc=15, argv=0x60000fffffffb628) at main.c:13
>
> S
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel