Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Bug in opal sos changes
From: Abhishek Kulkarni (adkulkar_at_[hidden])
Date: 2010-05-18 15:54:05


On Tue, 18 May 2010, Rolf vandeVaart wrote:

> I think we are almost saying the same thing. But to be sure, I will restate. The call to opal_pointer_array_add() can return either an index (which I assume is a positive
> integer, maybe also 0?) or OPAL_ERR_OUT_OF_RESOURCE (which is a -2) if it cannot malloc anymore space in the table.  So, I guess I agree that the original code was wrong as
> it never would have detected the error since OMPI_ERROR != OPAL_ERR_OUT_OF_RESOURCE.  (-1 != -2)
>
> Since we are overloading the return value, it seems like the only thing we could do is something like this:
>
> if (new_group->grp_f_to_c_index < 0)
>    error();
>

Yes, that looks like the right thing to do.

> But that does not follow the SOS logic as the key is that we want to compare to OMPI_SUCCESS (I think).  Also, for what it is worth, the setting of the grp_f_to_c_index
> happens in the group constructor, so we cannot get at the return value of opal_pointer_array_add() except by looking at the grp_f_to_c value after the object is
> constructed.  I am not sure the correct way to handle this.
>

The only reason we replace the OMPI_ERROR checks with OMPI_SUCCESS is
because when SOS logs an error in its internal data structures it returns
a new reference to the error (an encoded error-code which SOS can use to
locate the error). So, OMPI_ERROR is not OMPI_ERROR anymore but an SOS
encoded OMPI_ERROR. We could always wrap the code to be checked with a
call to extract its native error code and then perform the check like

   if (0 > OPAL_SOS_GET_ERROR_CODE(new_group->grp_f_to_c_index)) {
      error();
   }

In a lot of places (where functions return a boolean OMPI_SUCCESS or
OMPI_ERROR), it was perfectly legit to just switch the way it's done but
for the opal_pointer_array_add() and mca_base_param_* functions which
return an index or an error, the above transformation seems to be the way
to go.

I'll send in a patch with these changes.

Abhishek

> Rolf
>
> On 05/18/10 14:02, Jeff Squyres wrote:
>
> Looks like the comparison to OMPI_ERROR worked by accident -- just because it happened to have a value of -1.
>
> The *_f_to_c_index values are the return value from a call to opal_point_array_add(). This value will either be non-negative or -1. -1 indicates a failure. It's not an *_
> ERR_* code -- it's a -1 index value. So the comparisons should really have been to -1 in the first place.
>
> Rolf / Abhishek -- can you fix? Did this happen in other *_f_to_c_index member variable comparisons elsewhere?
>
>
>
> On May 18, 2010, at 1:25 PM, Rolf vandeVaart wrote:
>
>
>
> I am getting SEGVs while running the IMB-MPI1 tests. I believe the
> problem has to do with changes made to the group_init.c file last
> night. The error occurs in the call to MPI_Comm_split.
>
> There were 4 changes in the file that look like this:
> OLD:
> if (OMPI_ERROR == new_group->grp_f_to_c_index)
>
> NEW:
> if (OMPI_SUCCESS != new_group->grp_f_to_c_index)
>
> If I change it back, things work. I understand the idea of changing the
> logic, but maybe that does not apply in this case? When running with
> np=2, the value of new_group->grp_f_to_c_index is 4, thereby not
> equaling OMPI_SUCCESS and we end up in an error condition which results
> in a null pointer later on.
>
> Am I the only that has run into this?
>
> Rolf
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>
>
>
>
>
>