Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] setsockopt() fails with EINVAL on solaris
From: Daniel Junglas (daniel.junglas_at_[hidden])
Date: 2012-07-31 02:46:21


Thanks,

configuring with '--enable-mca-no-build=rmcast' did the trick for me.

Daniel

users-bounces_at_[hidden] wrote on 07/30/2012 04:21:13 PM:
> FWIW: the rmcast framework shouldn't be in 1.6. Jeff and I are
> testing removal and should have it out of there soon.
>
> Meantime, the best solution is to "--enable-mca-no-build rmcast"
>
> On Jul 30, 2012, at 7:15 AM, TERRY DONTJE wrote:
>
> Do you know what r# of 1.6 you were trying to compile? Is this via
> the tarball or svn?
>
> thanks,
>
> --td
>
> On 7/30/2012 9:41 AM, Daniel Junglas wrote:
> Hi,
>
> I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
> Compilation and installation worked without a problem. However,
> when trying to run an application with mpirun I always faced
> this error:
>
> [hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on
> MULTICAST_IF
> for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
> Error: Invalid argument (22)
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
> ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at
line
> 56
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
> ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
>
--------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_rmcast_base_select failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
>
>
> After some digging I found that the following patch seems to fix the
> problem (at least the application seems to run correct now):
> --- a/orte/mca/rmcast/udp/rmcast_udp.c Tue Apr 3 16:30:29 2012
> +++ b/orte/mca/rmcast/udp/rmcast_udp.c Mon Jul 30 15:12:02 2012
> @@ -936,9 +936,16 @@
> }
> } else {
> /* on the xmit side, need to set the interface */
> + void const *addrptr;
> memset(&inaddr, 0, sizeof(inaddr));
> inaddr.sin_addr.s_addr = htonl(chan->interface);
> +#ifdef __sun
> + addrlen = sizeof(inaddr.sin_addr);
> + addrptr = (void *)&inaddr.sin_addr;
> +#else
> addrlen = sizeof(struct sockaddr_in);
> + addrptr = (void *)&inaddr;
> +#endif
>
> OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
> "setup:socket:xmit interface
> %03d.%03d.%03d.%03d",
> @@ -945,7 +952,7 @@
> OPAL_IF_FORMAT_ADDR(chan->interface)));
>
> if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF,
> - (void *)&inaddr, addrlen)) < 0) {
> + addrptr, addrlen)) < 0) {
> opal_output(0, "%s rmcast:init: setsockopt() failed on
> MULTICAST_IF\n"
> "\tfor multicast network %03d.%03d.%03d.%03d
> interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> Can anybody confirm that the patch is good/correct? In particular
> that the '__sun' part is the right thing to do?
>
> Thanks,
>
> Daniel

>

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]

>

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users