Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Pointers for understanding failure messages on NetBSD
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-12-01 19:52:59


On Nov 29, 2009, at 6:15 PM, <Kevin.Buckley_at_[hidden]> <Kevin.Buckley_at_[hidden]
> wrote:

> $ mpirun -n 4 hello_f77
> [somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS)
> failed with
> errno=6
>

Oy. This is ick, because this error code is coming from horrendously
complex code deep in the depths of OMPI that is probing the OS to
figure out what ethernet interfaces you have. It may or may not be
simple to fix this.

Do you mind diving into the OMPI code a bit to figure this out? I'm
afraid that none of the developers are likely to have access to
NetBSD. :-( I can point you right where to look.

> When running on a "server" machine within the grid, a machine I am
> told
> should not be any different to the workstation I was using above in
> respect of user environment, I get a different error and find that the
> job does not run at all.
>
> This case seems to producean error message that is oft reported within
> the OpenMPI community:
>
> $ mpirun -n 4 hello_f77
> [somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Error
> in file
> ess_hnp_module.c at line 150
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel
> process is
> ...
>
> orte_rml_base_select failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
>

This could well be a side-effect of the same error as above -- OMPI
may have concluded that it didn't find any ethernet devices and
therefore aborted.

-- 
Jeff Squyres
jsquyres_at_[hidden]