ACM Home Page
Please provide us with feedback. Feedback
Answering XML queries by means of data summaries
Full text PdfPdf (888 KB)
Source
ACM Transactions on Information Systems (TOIS) archive
Volume 25 ,  Issue 3  (July 2007) table of contents
Article No. 10  
Year of Publication: 2007
ISSN:1046-8188
Authors
Elena Baralis  Politecnico di Torino, Torino, Italy
Paolo Garza  Politecnico di Torino, Torino, Italy
Elisa Quintarelli  Politecnico di Milano, Milano, Italy
Letizia Tanca  Politecnico di Milano, Milano, Italy
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 100,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247715.1247716
What is a DOI?

ABSTRACT

XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose a summarized representation of XML data, based on the concept of instance pattern, which can both provide succinct information and be directly queried. The physical representation of instance patterns exploits itemsets or association rules to summarize the content of XML datasets. Instance patterns may be used for (possibly partially) answering queries, either when fast and approximate answers are required, or when the actual dataset is not available, for example, it is currently unreachable. Experiments on large XML documents show that instance patterns allow a significant reduction in storage space, while preserving almost entirely the completeness of the query result. Furthermore, they provide fast query answers and show good scalability on the size of the dataset, thus overcoming the document size limitation of most current XQuery engines.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
Baralis, E., Garza, P., Quintarelli, E., and Tanca, L. 2004. Summarizing XML data by means of association rules. In Current Trends in Database Technology - EDBT 2004 Workshops, W. Lindner, M. Mesiti, C. Türker, Y. Tzitzikas, and A. Vakali, Eds. Lecture Notes in Computer Science, vol. 3268. Springer-Verlag, Berlin, Heidelberg, Germany, 260--269.
 
5
Baralis, E., Garza, P., Quintarelli, E., and Tanca, L. 2006. Answering XML queries by means of data summaries. Tech. Rep. 2006.43, Politecnico di Milano, Milano, Italy. March.
 
6
Boncz, P., Flokstra, J., Grust, T., van Keulen, M., Manegold, S., Mullender, S., Nes, N., Rittinger, J., Teubner, J., and Zhang, Y. 2006. MonetDB/XQuery. http://monetdb.cwi.nl/XQuery/.
 
7
Boncz, P. A., Grust, T., Manegold, S., Rittinger, J., and Teubner, J. 2005. Pathfinder: Relational XQuery Over Multi-Gigabyte XML Inputs In Interactive Time. Tech. Rep. INS-E0503, CWI, Amsterdam, The Netherlands. March
8
 
9
10
 
11
12
13
14
 
15
Damiani, E., Oliboni, B., Quintarelli, E., and Tanca, L. 2003. Modeling Semistructured Data by using graph-based constraints. Tech. Rep. 27/03, Politecnico di Milano. Dipartimento di Elettronica e Informazione. July.
 
16
 
17
Feinberg, G. 2005. Anatomy of a Native XML Database. Tech. rep., Sleepycat Software.
 
18
Feinberg, G. 2006. Berkeley DB XML. http://www.sleepycat.com/products/bdbxml.html/.
 
19
Feng, L. and Dillon, T. 2004. Mining XML-Enabled Association Rules with Templates. In Proceedings of the 3rd International Workshop on Knowledge Discovery in Inductive Databases, B. Goethals and A. Siebes, Eds. Lecture Notes in Computer Science, vol. 3377. Springer-Verlag, Berlin, Heidelberg, Germany, 66--88.
 
20
Fomichev, A., Grinev, M., and Kuznetsov, S. 2006. Sedna: A native XML DBMS. In 32nd Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2006, J. Wiedermann, G. Tel, J. Pokorný, M. Bieliková, and J. Stuller, Eds. Lecture Notes in Computer Science, vol. 3831. Springer-Verlag, Berlin, Heidelberg, Germany, 272--281.
21
22
23
 
24
 
25
Jones, K. S. and Willett, P. 1997. Readings in information retrieval. Morgan-Kaufmann, San Francisco, CA.
 
26
Kay, M. 2006. Saxon---The XSLT and XQuery processor. http://saxon.sourceforge.net/.
 
27
Ley, M. 2005. DBLP bibliography server. http://dblp.uni-trier.de/xml.
 
28
 
29
 
30
Merialdo, P. 2003. SIGMOD RECORD in XML. http://www.acm.org/sigmod/record/xml.
 
31
 
32
 
33
 
34
 
35
 
36
Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.
 
37
Runapongsa, K., Patel, J. M., Bordawekar, R., and Padmanabhan, S. 2004. XIST: An XML index selection tool. In Database and XML Technologies: Second International XML Database Symposium, XSym 2004, Z. Bellahsène, T. Milo, M. Rys, D. Suciu, and R. Unland, Eds. Lecture Notes in Computer Science, vol. 3186. Springer-Verlag, Berlin, Heidelberg, Germany, 219--234.
 
38
 
39
TPC-H. 2005. The TPC benchmark H. Transaction Processing Performance Council. http://www.tpc.org/tpch/default.asp.
 
40
W3C98b 1998. World Wide Web Consortium. Extensible Markup Language (XML) 1.0. http://www.w3C.org/TR/REC-xml/.
 
41
X-Hive Corporation 2006. X-Hive/DB. http://www.x-hive.com/.
 
42
Xin, D., Han, J., Li, X., and Wah, B. W. 2003. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In VLDB '03: Proceedings of 29th International Conference on Very Large Data Bases. Morgan-Kaufmann, San Francisco, CA, 476--487.


Collaborative Colleagues:
Elena Baralis: colleagues
Paolo Garza: colleagues
Elisa Quintarelli: colleagues
Letizia Tanca: colleagues