|
ABSTRACT
With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the fine-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans. We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways information passing between separate joins at query run-time, to provide highly effective filters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity estimations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outperforming the previously fastest systems by more than an order of magnitude.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
S. Auer et al. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, 2007.
|
 |
3
|
|
 |
4
|
|
| |
5
|
Bio2RDF, semantic web atlas of postgenomic knowledge about human and mouse. http://www.bio2rdf.org/.
|
| |
6
|
|
| |
7
|
J. Broekstra et al. Sesame: An architecture for storing and querying rdf data and schema information. In Spinning the Semantic Web, 2003.
|
| |
8
|
|
| |
9
|
Dbpedia 3.2 downloads. http://wiki.dbpedia.org/Downloads32.
|
 |
10
|
|
| |
11
|
|
| |
12
|
Open directory RDF dump. http://rdf.dmoz.org/.
|
| |
13
|
Metaweb technologies: Freebase data dumps. http://download.freebase.com/datadumps/.
|
| |
14
|
|
 |
15
|
|
 |
16
|
Alon Halevy , Michael Franklin , David Maier, Principles of dataspace systems, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.1-9, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142352]
|
| |
17
|
A. Harth et al. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, 2007.
|
| |
18
|
|
| |
19
|
Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.
|
 |
20
|
|
 |
21
|
Angela Maduko , Kemafor Anyanwu , Amit Sheth , Paul Schliekelman, Estimating the cardinality of RDF graph patterns, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242782]
|
| |
22
|
|
| |
23
|
MonetDB. http://monetdb.cwi.nl/.
|
 |
24
|
|
| |
25
|
|
| |
26
|
OpenRDF. http://www.openrdf.org/index.jsp.
|
| |
27
|
PostgreSQL. http://www.postgresql.org/.
|
| |
28
|
RDF-3X. http://www.mpi-inf.mpg.de/~neumann/rdf3x.
|
| |
29
|
Semantic web challenge 2008. billion triples track. http://challenge.semanticweb.org/.
|
 |
30
|
Praveen Seshadri , Joseph M. Hellerstein , Hamid Pirahesh , T. Y. Cliff Leung , Raghu Ramakrishnan , Divesh Srivastava , Peter J. Stuckey , S. Sudarshan, Cost-based optimization for magic: algebra and implementation, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.435-446, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
 |
34
|
Markus Stocker , Andy Seaborne , Abraham Bernstein , Christoph Kiefer , Dave Reynolds, SPARQL basic graph pattern optimization using selectivity estimation, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
[doi> 10.1145/1367497.1367578]
|
 |
35
|
|
| |
36
|
O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based rdf index. In AAAI, 2007.
|
| |
37
|
Uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.
|
| |
38
|
|
| |
39
|
K. Wilkinson et al. Efficient rdf storage and retrieval in jena2. In SWDB, 2003.
|
 |
40
|
|
|