|
ABSTRACT
XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. We propose a summarized representation of XML data, based on the concept of instance pattern, which can both provide succinct information and be directly queried. The physical representation of instance patterns exploits itemsets or association rules to summarize the content of XML datasets. Instance patterns may be used for (possibly partially) answering queries, either when fast and approximate answers are required, or when the actual dataset is not available, for example, it is currently unreachable. Experiments on large XML documents show that instance patterns allow a significant reduction in storage space, while preserving almost entirely the completeness of the query result. Furthermore, they provide fast query answers and show good scalability on the size of the dataset, thus overcoming the document size limitation of most current XQuery engines.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
Baralis, E., Garza, P., Quintarelli, E., and Tanca, L. 2004. Summarizing XML data by means of association rules. In Current Trends in Database Technology - EDBT 2004 Workshops, W. Lindner, M. Mesiti, C. Türker, Y. Tzitzikas, and A. Vakali, Eds. Lecture Notes in Computer Science, vol. 3268. Springer-Verlag, Berlin, Heidelberg, Germany, 260--269.
|
| |
5
|
Baralis, E., Garza, P., Quintarelli, E., and Tanca, L. 2006. Answering XML queries by means of data summaries. Tech. Rep. 2006.43, Politecnico di Milano, Milano, Italy. March.
|
| |
6
|
Boncz, P., Flokstra, J., Grust, T., van Keulen, M., Manegold, S., Mullender, S., Nes, N., Rittinger, J., Teubner, J., and Zhang, Y. 2006. MonetDB/XQuery. http://monetdb.cwi.nl/XQuery/.
|
| |
7
|
Boncz, P. A., Grust, T., Manegold, S., Rittinger, J., and Teubner, J. 2005. Pathfinder: Relational XQuery Over Multi-Gigabyte XML Inputs In Interactive Time. Tech. Rep. INS-E0503, CWI, Amsterdam, The Netherlands. March
|
 |
8
|
|
| |
9
|
|
 |
10
|
Peter Buneman , Susan Davidson , Gerd Hillebrand , Dan Suciu, A query language and optimization techniques for unstructured data, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.505-516, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
Damiani, E., Oliboni, B., Quintarelli, E., and Tanca, L. 2003. Modeling Semistructured Data by using graph-based constraints. Tech. Rep. 27/03, Politecnico di Milano. Dipartimento di Elettronica e Informazione. July.
|
| |
16
|
|
| |
17
|
Feinberg, G. 2005. Anatomy of a Native XML Database. Tech. rep., Sleepycat Software.
|
| |
18
|
Feinberg, G. 2006. Berkeley DB XML. http://www.sleepycat.com/products/bdbxml.html/.
|
| |
19
|
Feng, L. and Dillon, T. 2004. Mining XML-Enabled Association Rules with Templates. In Proceedings of the 3rd International Workshop on Knowledge Discovery in Inductive Databases, B. Goethals and A. Siebes, Eds. Lecture Notes in Computer Science, vol. 3377. Springer-Verlag, Berlin, Heidelberg, Germany, 66--88.
|
| |
20
|
Fomichev, A., Grinev, M., and Kuznetsov, S. 2006. Sedna: A native XML DBMS. In 32nd Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2006, J. Wiedermann, G. Tel, J. Pokorný, M. Bieliková, and J. Stuller, Eds. Lecture Notes in Computer Science, vol. 3831. Springer-Verlag, Berlin, Heidelberg, Germany, 272--281.
|
 |
21
|
|
 |
22
|
Jiawei Han , Jian Pei , Guozhu Dong , Ke Wang, Efficient computation of Iceberg cubes with complex measures, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.1-12, May 21-24, 2001, Santa Barbara, California, United States
|
 |
23
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
24
|
|
| |
25
|
Jones, K. S. and Willett, P. 1997. Readings in information retrieval. Morgan-Kaufmann, San Francisco, CA.
|
| |
26
|
Kay, M. 2006. Saxon---The XSLT and XQuery processor. http://saxon.sourceforge.net/.
|
| |
27
|
Ley, M. 2005. DBLP bibliography server. http://dblp.uni-trier.de/xml.
|
| |
28
|
|
| |
29
|
|
| |
30
|
Merialdo, P. 2003. SIGMOD RECORD in XML. http://www.acm.org/sigmod/record/xml.
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.
|
| |
37
|
Runapongsa, K., Patel, J. M., Bordawekar, R., and Padmanabhan, S. 2004. XIST: An XML index selection tool. In Database and XML Technologies: Second International XML Database Symposium, XSym 2004, Z. Bellahsène, T. Milo, M. Rys, D. Suciu, and R. Unland, Eds. Lecture Notes in Computer Science, vol. 3186. Springer-Verlag, Berlin, Heidelberg, Germany, 219--234.
|
| |
38
|
|
| |
39
|
TPC-H. 2005. The TPC benchmark H. Transaction Processing Performance Council. http://www.tpc.org/tpch/default.asp.
|
| |
40
|
W3C98b 1998. World Wide Web Consortium. Extensible Markup Language (XML) 1.0. http://www.w3C.org/TR/REC-xml/.
|
| |
41
|
X-Hive Corporation 2006. X-Hive/DB. http://www.x-hive.com/.
|
| |
42
|
Xin, D., Han, J., Li, X., and Wah, B. W. 2003. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In VLDB '03: Proceedings of 29th International Conference on Very Large Data Bases. Morgan-Kaufmann, San Francisco, CA, 476--487.
|
CITED BY
|
|
Stefano Ceri , Cristiana Bolchini , Daniele Braga , Marco Brambilla , Alessandro Campi , Sara Comai , Piero Fraternali , Pier Luca Lanzi , Marco Masseroli , Maristella Matera , Mauro Negri , Giuseppe Pelagatti , Giuseppe Pozzi , Elisa Quintarelli , Fabio A. Schreiber , Letizia Tanca, Data and web management research at Politecnico di Milano, ACM SIGMOD Record, v.36 n.4, December 2007
|
|