On 15/11/2013 17:50, Ralph Castain wrote:
Hmm...well, that will make debug a tad more difficult. I've attached a patch that *should* stop the segfault. Given that behavior, though, it looks like the system isn't finding either rsh or ssh on your machine. Might be the root cause of the problem.
With your patch:
$ ./mpirun -mca plm_base_verbose 5 -mca ras_base_verbose 5 -mcarmaps_base_verbose 5 -mca ess_base_verbose  5 -c 4 foo
[merulo:08821] mca:base:select:(  plm) Querying component [rsh]
[merulo:08821] [[INVALID],INVALID] plm:base:rsh_lookup on agent ssh : rsh path NULL
[merulo:08821] *** Process received signal ***
[merulo:08821] Signal: Segmentation fault (11)
[merulo:08821] Signal code: Invalid permissions (2)
[merulo:08821] Failing at address: (nil)
[merulo:08821] [ 0] linux-gate.so.1(__kernel_sigtramp+0x7fffffffff886860) [0xa000000000040800]
[merulo:08821] [ 1] /home/sylvestre/bogus2/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_component_query+0xae3b0) [0x2000000000867f30]
[merulo:08821] [ 2] /home/sylvestre/bogus2/lib/libopen-rte.so.4(mca_base_select-0x5dc110) [0x20000000001ddea0]
[merulo:08821] [ 3] /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_plm_base_select-0x680cd0) [0x20000000001392f0]
[merulo:08821] [ 4] /home/sylvestre/bogus2/lib/openmpi/mca_ess_hnp.so(+0x56f0) [0x20000000008316f0]
[merulo:08821] [ 5] /home/sylvestre/bogus2/lib/libopen-rte.so.4(orte_init-0x72bf10) [0x200000000008e0c0]
[merulo:08821] [ 6] ./mpirun(orterun+0x1fffffffff84cc80) [0x4000000000006c60]
[merulo:08821] [ 7] ./mpirun(main+0x1fffffffff84b880) [0x40000000000045e0]
[merulo:08821] [ 8] /lib/ia64-linux-gnu/libc.so.6.1(__libc_start_main-0x2fcd50) [0x20000000004bd2a0]
[merulo:08821] [ 9] ./mpirun(_start+0x1fffffffff84a3c0) [0x40000000000043c0]
[merulo:08821] *** End of error message ***
Segmentation fault

bt:
Program received signal SIGSEGV, Segmentation fault.
0x2000000000867f30 in orte_plm_rsh_component_query (
    module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0)
    at plm_rsh_component.c:205
205            OPAL_OUTPUT_VERBOSE((1, orte_plm_globals.output,
(gdb) bt
#0  0x2000000000867f30 in orte_plm_rsh_component_query (
    module=0x60000fffffffb0e8, priority=0x60000fffffffb0e0)
    at plm_rsh_component.c:205
#1  0x20000000001ddea0 in mca_base_select (
    type_name=0x200000000026e708 "plm", output_id=8,
    components_available=0x20000000002c5f08 <orte_plm_base>,
    best_module=0x60000fffffffb0f0, best_component=0x60000fffffffb0f8)
    at mca_base_components_select.c:76
#2  0x20000000001392f0 in orte_plm_base_select () at base/plm_base_select.c:46
#3  0x20000000008316f0 in rte_init () at ess_hnp_module.c:169
#4  0x200000000008e0c0 in orte_init (pargc=0x60000fffffffb370,
    pargv=0x60000fffffffb378, flags=4) at runtime/orte_init.c:127
#5  0x4000000000006c60 in orterun (argc=15, argv=0x60000fffffffb628)
    at orterun.c:693
#6  0x40000000000045e0 in main (argc=15, argv=0x60000fffffffb628) at main.c:13

S