ACM Home Page
Please provide us with feedback. Feedback
Scalable join processing on very large RDF graphs
Full text PdfPdf (569 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 16: query processing on semi-structured data table of contents
Pages 627-640  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Thomas Neumann  Max-Planck Institute for Informatics, Saarbrücken, Germany
Gerhard Weikum  Max-Planck Institute for Informatics, Saarbrücken, Germany
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559911
What is a DOI?

ABSTRACT

With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the fine-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans.

We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways information passing between separate joins at query run-time, to provide highly effective filters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity estimations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outperforming the previously fastest systems by more than an order of magnitude.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. Auer et al. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, 2007.
3
4
 
5
Bio2RDF, semantic web atlas of postgenomic knowledge about human and mouse. http://www.bio2rdf.org/.
 
6
 
7
J. Broekstra et al. Sesame: An architecture for storing and querying rdf data and schema information. In Spinning the Semantic Web, 2003.
 
8
 
9
Dbpedia 3.2 downloads. http://wiki.dbpedia.org/Downloads32.
10
 
11
 
12
Open directory RDF dump. http://rdf.dmoz.org/.
 
13
Metaweb technologies: Freebase data dumps. http://download.freebase.com/datadumps/.
 
14
15
16
 
17
A. Harth et al. Yars2: A federated repository for querying graph structured data from the web. In ISWC/ASWC, 2007.
 
18
 
19
Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/.
20
21
 
22
 
23
MonetDB. http://monetdb.cwi.nl/.
24
 
25
 
26
OpenRDF. http://www.openrdf.org/index.jsp.
 
27
PostgreSQL. http://www.postgresql.org/.
 
28
RDF-3X. http://www.mpi-inf.mpg.de/~neumann/rdf3x.
 
29
Semantic web challenge 2008. billion triples track. http://challenge.semanticweb.org/.
30
 
31
 
32
 
33
34
35
 
36
O. Udrea, A. Pugliese, and V. S. Subrahmanian. Grin: A graph based rdf index. In AAAI, 2007.
 
37
Uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/.
 
38
 
39
K. Wilkinson et al. Efficient rdf storage and retrieval in jena2. In SWDB, 2003.
40

Collaborative Colleagues:
Thomas Neumann: colleagues
Gerhard Weikum: colleagues