Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Pointers for understanding failure messages on NetBSD
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-12-01 20:00:24


I believe what this is saying is that we are not finding any TCP interfaces - the ioctl itself is failing. So yes - miprun failing at that point is going to happen because we have no way to communicate for launch.

Do you see interfaces if you do an /sbin/ifconfig? Do they have valid IP addresses?

On Dec 1, 2009, at 5:52 PM, Jeff Squyres wrote:

> On Nov 29, 2009, at 6:15 PM, <Kevin.Buckley_at_[hidden]> <Kevin.Buckley_at_[hidden]> wrote:
>
>> $ mpirun -n 4 hello_f77
>> [somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS) failed with
>> errno=6
>>
>
> Oy. This is ick, because this error code is coming from horrendously complex code deep in the depths of OMPI that is probing the OS to figure out what ethernet interfaces you have. It may or may not be simple to fix this.
>
> Do you mind diving into the OMPI code a bit to figure this out? I'm afraid that none of the developers are likely to have access to NetBSD. :-( I can point you right where to look.
>
>> When running on a "server" machine within the grid, a machine I am told
>> should not be any different to the workstation I was using above in
>> respect of user environment, I get a different error and find that the
>> job does not run at all.
>>
>> This case seems to producean error message that is oft reported within
>> the OpenMPI community:
>>
>> $ mpirun -n 4 hello_f77
>> [somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Error in file
>> ess_hnp_module.c at line 150
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> ...
>>
>> orte_rml_base_select failed
>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>
>
> This could well be a side-effect of the same error as above -- OMPI may have concluded that it didn't find any ethernet devices and therefore aborted.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users