“RDF Binary” is a efficient format for RDF and RDF-related data using Apache Thrift or Google Protocol Buffers as the binary data encoding.
The W3C standard RDF syntaxes are text or XML based. These incur costs in parsing; the most human-readable formats also incur high costs to write, and have limited scalability due to the need to analyse the data for pretty printing rather than simply stream to output.
Binary formats are faster to process - they do not incur the parsing costs of text-base formats. “RDF Binary” defines basic encoding for RDF terms, then builds data formats for RDF graphs, RDF datasets, and for SPARQL result sets. This gives a basis for high-performance linked data systems.
Thrift and Protobuf provides efficient, widely-used, binary encoding layers each with a large number of language bindings.
For more details of RDF Thrift.
Thrift encoding of RDF Terms
RDF Thrift uses the Thrift compact protocol.
Source: BinaryRDF.thrift
RDF terms
struct RDF_IRI {
1: required string iri
}
# A prefix name (abbrev for an IRI)
struct RDF_PrefixName {
1: required string prefix ;
2: required string localName ;
}
struct RDF_BNode {
1: required string label
}
struct RDF_Literal {
1: required string lex ;
2: optional string langtag ;
3: optional string datatype ;
4: optional RDF_PrefixName dtPrefix ;
}
struct RDF_Decimal {
1: required i64 value ;
2: required i32 scale ;
}
struct RDF_VAR {
1: required string name ;
}
struct RDF_ANY { }
struct RDF_UNDEF { }
struct RDF_REPEAT { }
union RDF_Term {
1: RDF_IRI iri
2: RDF_BNode bnode
3: RDF_Literal literal
4: RDF_PrefixName prefixName
5: RDF_VAR variable
6: RDF_ANY any
7: RDF_UNDEF undefined
8: RDF_REPEAT repeat
9: RDF_Triple tripleTerm # RDF-star
# Value forms of literals.
10: i64 valInteger
11: double valDouble
12: RDF_Decimal valDecimal
}
Thrift encoding of Triples, Quads and rows.
struct RDF_Triple {
1: required RDF_Term S
2: required RDF_Term P
3: required RDF_Term O
}
struct RDF_Quad {
1: required RDF_Term S
2: required RDF_Term P
3: required RDF_Term O
4: optional RDF_Term G
}
struct RDF_PrefixDecl {
1: required string prefix ;
2: required string uri ;
}
Thrift encoding of RDF Graphs and RDF Datasets
union RDF_StreamRow {
1: RDF_PrefixDecl prefixDecl
2: RDF_Triple triple
3: RDF_Quad quad
}
RDF Graphs are encoded as a stream of RDF_Triple
and RDF_PrefixDecl
.
RDF Datasets are encoded as a stream of RDF_Triple
, RDF-Quad
and RDF_PrefixDecl
.
Thrift encoding of SPARQL Result Sets
A SPARQL Result Set is encoded as a list of variables (the header), then a stream of rows (the results).
struct RDF_VarTuple {
1: list<RDF_VAR> vars
}
struct RDF_DataTuple {
1: list<RDF_Term> row
}
Protobuf encoding of RDF Terms
The Protobuf schema is simialr.
Source: binary-rdf.proto
Streaming isused to allow for abitrary size graphs. Therefore the steram items
(RDF_StreamRow
below) are written with an initial length (writeDelimitedTo
in the Java API).
See Protobuf Techniques Streaming.
syntax = "proto3";
option java_package = "org.apache.jena.riot.protobuf.wire" ;
// Prefer one file with static inner classes.
option java_outer_classname = "PB_RDF" ;
// Optimize for speed (default)
option optimize_for = SPEED ;
//option java_multiple_files = true;
// ==== RDF Term Definitions
message RDF_IRI {
string iri = 1 ;
}
// A prefix name (abbrev for an IRI)
message RDF_PrefixName {
string prefix = 1 ;
string localName = 2 ;
}
message RDF_BNode {
string label = 1 ;
// 2 * fixed64
}
// Common abbreviations for datatypes and other URIs?
// union with additional values.
message RDF_Literal {
string lex = 1 ;
oneof literalKind {
bool simple = 9 ;
string langtag = 2 ;
string datatype = 3 ;
RDF_PrefixName dtPrefix = 4 ;
}
}
message RDF_Decimal {
sint64 value = 1 ;
sint32 scale = 2 ;
}
message RDF_Var {
string name = 1 ;
}
message RDF_ANY { }
message RDF_UNDEF { }
message RDF_REPEAT { }
message RDF_Term {
oneof term {
RDF_IRI iri = 1 ;
RDF_BNode bnode = 2 ;
RDF_Literal literal = 3 ;
RDF_PrefixName prefixName = 4 ;
RDF_Var variable = 5 ;
RDF_Triple tripleTerm = 6 ;
RDF_ANY any = 7 ;
RDF_UNDEF undefined = 8 ;
RDF_REPEAT repeat = 9 ;
// Value forms of literals.
sint64 valInteger = 20 ;
double valDouble = 21 ;
RDF_Decimal valDecimal = 22 ;
}
}
// === StreamRDF items
message RDF_Triple {
RDF_Term S = 1 ;
RDF_Term P = 2 ;
RDF_Term O = 3 ;
}
message RDF_Quad {
RDF_Term S = 1 ;
RDF_Term P = 2 ;
RDF_Term O = 3 ;
RDF_Term G = 4 ;
}
// Prefix declaration
message RDF_PrefixDecl {
string prefix = 1;
string uri = 2 ;
}
// StreamRDF
message RDF_StreamRow {
oneof row {
RDF_PrefixDecl prefixDecl = 1 ;
RDF_Triple triple = 2 ;
RDF_Quad quad = 3 ;
RDF_IRI base = 4 ;
}
}
message RDF_Stream {
repeated RDF_StreamRow row = 1 ;
}
// ==== SPARQL Result Sets
message RDF_VarTuple {
repeated RDF_Var vars = 1 ;
}
message RDF_DataTuple {
repeated RDF_Term row = 1 ;
}
// ==== RDF Graph
message RDF_Graph {
repeated RDF_Triple triple = 1 ;
}