Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] setsockopt() fails with EINVAL on solaris
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-07-30 10:20:11


Ralph actually suggests that we just remove rmcast from 1.6.1.

On Jul 30, 2012, at 10:15 AM, TERRY DONTJE wrote:

> Do you know what r# of 1.6 you were trying to compile? Is this via the tarball or svn?
>
> thanks,
>
> --td
>
> On 7/30/2012 9:41 AM, Daniel Junglas wrote:
>> Hi,
>>
>> I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
>> Compilation and installation worked without a problem. However,
>> when trying to run an application with mpirun I always faced
>> this error:
>>
>> [hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on
>> MULTICAST_IF
>> for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
>> Error: Invalid argument (22)
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
>> ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line
>> 56
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
>> ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems. This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>> orte_rmcast_base_select failed
>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>
>>
>> After some digging I found that the following patch seems to fix the
>> problem (at least the application seems to run correct now):
>> --- a/orte/mca/rmcast/udp/rmcast_udp.c Tue Apr 3 16:30:29 2012
>> +++ b/orte/mca/rmcast/udp/rmcast_udp.c Mon Jul 30 15:12:02 2012
>> @@ -936,9 +936,16 @@
>> }
>> } else {
>> /* on the xmit side, need to set the interface */
>> + void const *addrptr;
>> memset(&inaddr, 0, sizeof(inaddr));
>> inaddr.sin_addr.s_addr = htonl(chan->interface);
>> +#ifdef __sun
>> + addrlen = sizeof(inaddr.sin_addr);
>> + addrptr = (void *)&inaddr.sin_addr;
>> +#else
>> addrlen = sizeof(struct sockaddr_in);
>> + addrptr = (void *)&inaddr;
>> +#endif
>>
>> OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
>> "setup:socket:xmit interface
>> %03d.%03d.%03d.%03d",
>> @@ -945,7 +952,7 @@
>> OPAL_IF_FORMAT_ADDR(chan->interface)));
>>
>> if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF,
>> - (void *)&inaddr, addrlen)) < 0) {
>> + addrptr, addrlen)) < 0) {
>> opal_output(0, "%s rmcast:init: setsockopt() failed on
>> MULTICAST_IF\n"
>> "\tfor multicast network %03d.%03d.%03d.%03d
>> interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>> Can anybody confirm that the patch is good/correct? In particular
>> that the '__sun' part is the right thing to do?
>>
>> Thanks,
>>
>> Daniel
>>
>>
>>
>> _______________________________________________
>> users mailing list
>>
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/