|
ABSTRACT
RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with a streamlined architecture and carefully designed, puristic data structures and operations. The salient points of RDF-3X are: 1) a generic solution for storing and indexing RDF triples that completely eliminates the need for physical-design tuning, 2) a powerful yet simple query processor that leverages fast merge joins to the largest possible extent, and 3) a query optimizer for choosing optimal join orders using a cost model based on statistical synopses for entire join paths. The performance of RDF-3X, in comparison to the previously best state-of-the-art systems, has been measured on several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching and long join paths in the underlying data graphs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
S. Alexaki et al. The ics-forth rdfsuite: Managing voluminous rdf description bases. In SemWeb, 2001.
|
 |
3
|
|
| |
4
|
S. Auer et al. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, 2007.
|
| |
5
|
L. Baolin and H. Bo. Path queries based rdf index. In SKG, 2005.
|
| |
6
|
L. Baolin and H. Bo. Hprd: A high performance rdf database. In NPC, 2007.
|
| |
7
|
BioPAX: Biological Pathways Exchange. http://www.biopax.org/.
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
Alon Halevy , Michael Franklin , David Maier, Principles of dataspace systems, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.1-9, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142352]
|
| |
19
|
A. Harth, J. Umbrich, A. Hogan, and S. Decker. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, 2007.
|
| |
20
|
|
| |
21
|
A. Hogan and A. Harth. The expertfinder corpus 2007 for the benchmarking development of expert-finding systems. In International ExpertFinder Workshop, 2007.
|
| |
22
|
|
| |
23
|
Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.
|
| |
24
|
M. Kersten and A. P. Siebes. An organic database system. Technical report, CWI, 1999.
|
 |
25
|
Angela Maduko , Kemafor Anyanwu , Amit Sheth , Paul Schliekelman, Estimating the cardinality of RDF graph patterns, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242782]
|
| |
26
|
|
| |
27
|
Monetdb. http://monetdb.cwi.nl/.
|
| |
28
|
Openrdf. http://www.openrdf.org/index.jsp.
|
| |
29
|
Oracle technical network, semantic technologies center. http://www.oracle.com/technology/tech/semantic_technologies/index.html.
|
| |
30
|
W3C: Resource Description Framework (RDF). http://www.w3.org/RDF/.
|
| |
31
|
RDFizers. http://simile.mit.edu/wiki/RDFizers.
|
| |
32
|
ICS-FORTH RDF suite. http://athena.ics.forth.gr:9090/RDF/.
|
 |
33
|
|
 |
34
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
 |
35
|
David Simmen , Eugene Shekita , Timothy Malkemus, Fundamental techniques for order optimization, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.57-67, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
36
|
W3C: SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/.
|
| |
37
|
|
 |
38
|
Markus Stocker , Andy Seaborne , Abraham Bernstein , Christoph Kiefer , Dave Reynolds, SPARQL basic graph pattern optimization using selectivity estimation, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
[doi> 10.1145/1367497.1367578]
|
| |
39
|
M. Stonebraker et al. One size fits all? part 2: Benchmarking studies. In CIDR, 2007.
|
 |
40
|
|
| |
41
|
Y. Theoharis, V. Christophides, and G. Karvounarakis. Benchmarking database representations of rdf/s stores. In International Semantic Web Conference, 2005.
|
| |
42
|
O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based rdf index. In AAAI, 2007.
|
| |
43
|
uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.
|
| |
44
|
|
 |
45
|
|
| |
46
|
K. Wilkinson et al. Efficient rdf storage and retrieval in jena2. In SWDB, 2003.
|
| |
47
|
W3C: RDF/OWL representation of WordNet. http://www.w3.org/TR/wordnet-rdf/.
|
| |
48
|
Yars2. http://sw.deri.org/svn/sw/2004/06/yars.
|
| |
49
|
F. Zhu, X. Yan, J. Han, and P. S. Yu. gprune: A constraint pushing framework for graph pattern mining. In PAKDD, 2007.
|
 |
50
|
|
|