Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RML Send
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-06-19 14:17:12


Okay, I've traced this down. The problem is that a DSS-internal function has
been exposed via the API, so now people can mistakenly call the wrong one.
You should -never- be using opal_dss.pack_buffer or opal_dss.unpack_buffer.
Those were supposed to be internal to the DSS only, and will definitely mess
you up if called directly.

I'll fix this problem to avoid future issues. There is a comment in dss.h
that warns you never to call those functions, but who would remember?

I sure wouldn't. I've only avoided the problem because of ignorance - I
didn't know those API's existed!

Should have a fix in later today.
Ralph

On 6/19/08 8:43 AM, "Ralph H Castain" <rhc_at_[hidden]> wrote:

> WOW! Somebody really screwed up the DSS by adding some new API's I'd never
> heard of before, but really can cause the system to break!
>
> I'm going to have to straighten this mess out - it is a total disaster.
> There needs to be just ONE way of packing and unpacking, not two totally
> incompatible methods.
>
> Will let you know when it is fixed - probably early next week.
> Ralph
>
>
>
> On 6/19/08 8:34 AM, "Leonardo Fialho" <lfialho_at_[hidden]> wrote:
>
>> Hi Ralph,
>>
>> Mi mistake, I'm really using ORTE_PROC_MY_DAEMON->jobid.
>>
>> I have success using pack_buffer()/unpack_buffer() and OPAL_BYTE type,
>> something strange occur when I was using pack()/unpack(). The value of
>> num_bytes increase, example:
>> I tried to read num_bytes=5, and after a unpack this var have 33! I
>> don't understand it...
>>
>> Thanks,
>> Leonardo Fialho
>>
>> Ralph Castain escribió:
>>>
>>> On 6/17/08 3:35 PM, "Leonardo Fialho" <lfialho_at_[hidden]> wrote:
>>>
>>>
>>>> Hi Ralph,
>>>>
>>>> 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I
>>>> defined in "odls_types.h".
>>>> 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ...
>>>> 3) I'm not blocking the "process_commands" function with long code.
>>>> 4) To know the daemon's vpid and jobid I used the same jobid from the
>>>> app (in this solution, I can be changed) and the vpid is ordered
>>>> sequentially (0 for mpirun and 1 to N for the orted's).
>>>>
>>>
>>> The jobid of the daemons is different from the jobid of the apps. So at the
>>> moment, you are actually sending the message to another app!
>>>
>>> You can find the jobid of the daemons by extracting it as
>>> ORTE_PROC_MY_DAEMON->jobid. Please note, though, that the app has no
>>> knowledge of the contact info for that daemon, so this message will have to
>>> route through the local daemon. Happens transparently, but just wanted to be
>>> clear as to how this is working.
>>>
>>>
>>>> The problems is: I need to send a buffered data, and I don't know the
>>>> type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to
>>>> send it but I got no success.... :(
>>>>
>>>
>>> If I recall correctly, you were trying to archive messages that flowed
>>> through the PML - correct? I would suggest just treating them as bytes and
>>> packing them as an opal_byte_object_t, something like this:
>>>
>>> opal_byte_object_t bo;
>>>
>>> bo.size = sizeof(my-data);
>>> bo.data = *my_data;
>>>
>>> opal_dss.pack(*buffer, &bo, 1, OPAL_BYTE_OBJECT);
>>>
>>> Then on the other end:
>>>
>>> opal_byte_object_t *bo;
>>> int32_t n;
>>>
>>> opal_dss.unpack(*buffer, &bo, &n, OPAL_BYTE_OBJECT);
>>>
>>> You can then transfer the data into whatever storage you like. All this does
>>> is pass the #bytes and the bytes as a collected unit - you could, of course,
>>> simply pass the #bytes and bytes with independent packs if you wanted:
>>>
>>> int32_t num_bytes;
>>> uint8_t *my_data;
>>>
>>> opal_dss.pack(*buffer, &num_bytes, 1, OPAL_INT32);
>>> opal_dss.pack(*buffer, my-data, num_bytes, OPAL_BYTE);
>>>
>>> ...
>>>
>>> opal_dss.unpack(*buffer, &num_bytes, &n, OPAL_INT32);
>>> my_data = (uint8_t*)malloc(num_bytes);
>>> opal_dss.unpack(*buffer, &my_data, &num_bytes, OPAL_BYTE);
>>>
>>>
>>> Up to you.
>>>
>>> Hope that helps
>>> Ralph
>>>
>>>
>>>> Thanks in advance,
>>>> Leonardo Fialho
>>>>
>>>>
>>>> Ralph H Castain escribió:
>>>>
>>>>> I'm not sure exactly how you are trying to do this, but the usual
>>>>> procedure
>>>>> would be:
>>>>>
>>>>> 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you
>>>>> want to put in the buffer. So you might call this to pack a string:
>>>>>
>>>>> opal_dss.pack(*buffer, &string, 1, OPAL_STRING);
>>>>>
>>>>> 2. once you have everything packed into the buffer, you send the buffer
>>>>> with
>>>>>
>>>>> orte_rml.send_buffer(*dest, *buffer, dest_tag, 0);
>>>>>
>>>>> What you will need is a tag that the daemon is listening on that won't
>>>>> interfere with its normal operations - i.e., what you send won't get held
>>>>> forever waiting to get serviced, and your servicing won't block us from
>>>>> responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you
>>>>> need to ensure you don't block anything.
>>>>>
>>>>> BTW: how is the app figuring out the name of the remote daemon? The proc
>>>>> will have access to the daemon's vpid (assuming it knows the nodename
>>>>> where
>>>>> the daemon is running) in the ESS, but not the jobid - I assume you are
>>>>> using some method to compute the daemon jobid from the apps?
>>>>>
>>>>>
>>>>> On 6/17/08 12:08 PM, "Leonardo Fialho" <lfialho_at_[hidden]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I´m using RML to send log messages from a PML to a ORTE daemon (located
>>>>>> in another node). I got success sending the message header, but now I
>>>>>> need to send the message data (buffer). How can I do it? The problem is
>>>>>> what data type I need to use for packing/unpacking? I tried
>>>>>> OPAL_DATA_VALUE but don´t get success...
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel