This document describes the protocols and structures used by Rserve (version 0.1-9). This information is helpful for implementing Rserve clients.
Rserve communication is performed over any reliable connection-oriented
protocol (usually TCP/IP; Rserve 0.1-9 supports TCP/IP and local unix sockets).
After connection is established, the server
sends 32 bytes representing the ID-string defining the capabilities of the server.
Each attribute of the ID-string is 4 bytes long and is meant to be user-
readable (i.e. use no special characters), and it's a good idea to make
"\r\n\r\n" the last attribute.
the ID string must be of the form:
[0] "Rsrv" - R-server ID signature
[4] "0100" - version of the R server
[8] "QAP1" - protocol used for communication (here Quad Attributes Packets v1)
[12] any additional attributes follow. \r\n and '-' are ignored.
optional attributes
(in any order; it is legitimate to put dummy attributes, like "----" or
" " between attributes):
"R151" - version of R (here 1.5.1)
"ARpt" - authorization required (here "pt"=plain text, "uc"=unix crypt)
connection will be closed
if the first packet is not CMD_login.
if more AR.. methods are specified, then client is free to
use the one he supports (usually the most secure)
"K***" - key if encoded authentification is challenged (*** is the key)
for unix crypt the first two letters of the key are the salt
required by the server */
The protocol specified in the third attribute (here QAP1) is used immediately after the ID string was transmitted.
QAP1 message oriented protocol
QAP1 (quad attributes protocol v1) is a message oriented protocol, i.e. the initiating side (here the client) sends a message and awaits a response. The message contains both the action to be taken and any necessary data. The response contains a response code and any associated data. Every message consists of a header and data part (which can be empty). The header is structured as follows:
[0] (int) command
[4] (int) length of the message-16
[8] (int) offset of the data part
[12] (int) reserved (must be 0)
command specifies the request or response type.
length specifies the number of bytes belonging to this message after the header.
offset specifies the offset of the data part, where 0 means directly after the header (which is normally the case)
res reserved for future use
The header must always be transmitted en-block. Data part can be split into packets of an arbitrary size. Each message consists of 16 bytes (the header) plus data. Therefore a message consists of length+16 bytes.
The data part contains any additional parameters that are send along with the command. Each attribute consists of 4-byte header:
[0] (byte) type
[1] (24-bit int) length
Types used by the current Rserve implementation (for list of all supported types see Rsrv.h):
- DT_INT (4 bytes) integer
- DT_STRING (n bytes) null terminated string
- DT_BYTESTREAM (n bytes) any binary data
- DT_SEXP R's encoded SEXP, see below
all int and double entries throughout the transfer are encoded in Intel-endianess format:
int=0x12345678 -> char[4]=(0x78,0x56,x34,0x12)
functions/macros for converting from native to protocol format are available in Rsrv.h.
Commands supported by Rserve
Supported commands:
command parameters | response data
CMD_login DT_STRING | -
CMD_voidEval DT_STRING | -
CMD_eval DT_STRING | DT_SEXP
CMD_shutdown [DT_STRING] | -
CMD_openFile DT_STRING | -
CMD_createFile DT_STRING | -
CMD_closeFile - | -
CMD_readFile [DT_INT] | DT_BYTESTREAM
CMD_writeFile DT_BYTESTREAM | -
CMD_removeFile DT_STRING | -
CMD_setSEXP DT_STRING, | -
DT_SEXP
CMD_assignSEXP DT_STRING, | -
DT_SEXP
CMD_setBufferSize DT_INT | -
(Parameters in brackets [] are optional)
Responses:
The CMD_RESP mask is set for all responses. Each response consists of the response command (RESP_OK or RESP_ERR - least significant 24 bit) and the status code (most significant 8 bits). For a list of all currently supported status codes see ERR_... in Rsrv.h.
Encoding of SEXP R expression
R SEXP value (DT_SEXP) are recursively encoded in a similar way as the parameter attributes. Each SEXP consists of a 4-byte header and the actual contents. The header is of the form:
[0] (byte) eXpression Type
[1] (24-bit int) length
The expression type consists of the actual type (least significant 6 bits) and attributes. Follwing expression types are supported:
XT_NULL data: -
XT_INT data: (4) int
XT_DOUBLE data: (8) double
XT_STR data: (n) char null-term. strg.
XT_LANG data: same as XT_LIST
XT_SYM data: (n) char symbol name
XT_BOOL data: (1) byte boolean
(1=TRUE, 0=FALSE, 2=NA)
XT_VECTOR data: (n*?) SEXP
XT_LIST data: SEXP head, SEXP vals, [SEXP tag]
XT_CLOS data: SEXP formals, SEXP body
XT_ARRAY_INT data: (n*4) int,int,..
XT_ARRAY_DOUBLE data: (n*8) double,double,..
XT_ARRAY_STR data: (?) string,string,..
XT_ARRAY_BOOL data: (n) byte,byte,..
XT_UNKNOWN data: (4) int - SEXP type as defined in R
Attributes:
XT_HAS_ATTR - if this flag is set then the SEXP has an attribute which is stored before the actual expression. In this case the layout looks as follows:
[0] (4) header SEXP: len=4+m+n, XT_HAS_ATTR is set
[4] (4) header attribute SEXP: len=n
[8] (n) data attribute SEXP
[8+n] (m) data SEXP
Additions in version 0.2
Since version 0.2-0 the ID string reports version 0101 because of a change that makes it partially incompatible with previous versions. Main change is the fact that Rserve reporting version 0100 incorrectly omitted DT_SEXP header from the response to CMD_eval commands. This means that clients should check the version reported by Rserve and provide fix (for 0100 you can assume that CMD_eval always returns contents of a SEXP even if no DT_SEXP header is sent). Rserve reporting 0101 responds consistently, i.e. the proper DT_SEXP header is sent.
Second change is the requirement to pad strings with zeros so the length of the parameter/content is divisible by 4. Depending on the platform used the server may respond with ERR_inv_par if the parameters are not correctly alligned. Rserve reporting 0101 will itself pad strings in such manner when sending responses to the client.
Update: 2003-09-18: The previous documentation incorrectly stated that the second entry of the 4 byte headers (response and attribute) was 12-bit int, whereas it is in fact a 24-bit int. This was corrected now.
Additions in version 0.3
Rserve version 0.3 reports ID string version 0102 because support for large data was added. Previous versions were limited by the 24-bit length of parameters and SEXPs. The 0.3 version enhances the protocol by adding special flag DT_LARGE to parameter types and XT_LARGE to eXpression types. If this flag is set then the header is 8 bytes long (instead of previously 4 bytes). The additional 4 bytes are used for the parameter/expression length leading to a total of 56-bit maximum length of an expression or parameter (that is 65536TB which should be sufficient). Any data smaller 0x800000 (8MB) must be still coded in the original 4-byte header format. Current Rserve sends only data larger 0xfffff0 (16MB-16) in the large data format. Clients are encouraged to use the same threshold, but it's not required by the protocol.
|