|
ABSTRACT
We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating keyword search queries over hierarchical XML documents, as opposed to (conceptually) flat HTML documents, introduces many new challenges. First, XML keyword search queries do not always return entire documents, but can return deeply nested XML elements that contain the desired keywords. Second, the nested structure of XML implies that the notion of ranking is no longer at the granularity of a document, but at the granularity of an XML element. Finally, the notion of keyword proximity is more complex in the hierarchical XML data model. In this paper, we present the XRANK system that is designed to handle these novel features of XML keyword search. Our experimental results show that XRANK offers both space and performance benefits when compared with existing approaches. An interesting feature of XRANK is that it naturally generalizes a hyperlink based HTML search engine such as Google. XRANK can thus be used to query a mix of HTML and XML documents.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
V. Aguilera, S. Cluet, F. Wattez, "Xyleme Query Architecture", WWW Conf., 2001.
|
 |
3
|
|
| |
4
|
G. Bhalotia, et al., "Keyword Searching and Browsing in Databases using BANKS", ICDE Conf., 2002.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
L. J. Brown , M. P. Consens , I. J. Davis , C. R. Palmer , F. W. Tompa, A structured text ADT for object-relational databases, Theory and Practice of Object Systems, v.4 n.4, p.227-244, Oct. 12, 1998
[doi> 10.1002/(SICI)1096-9942(1998)4:4<227::AID-TAPO3>3.3.CO;2-L]
|
 |
9
|
|
 |
10
|
Soumen Chakrabarti , Mukul Joshi , Vivek Tawde, Enhanced topic distillation using text, markup tags, and hyperlinks, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.208-216, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383990]
|
 |
11
|
V. Christophides , S. Abiteboul , S. Cluet , M. Scholl, From structured documents to novel query facilities, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.313-324, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
L. Guo, F. Shao, C. Botev, J. Shanmugasundaram, "XRANK: Ranked Keyword Search Over XML Documents", Cornell University Technical Report, 2003.
|
| |
19
|
|
| |
20
|
V. Hristidis, Y. Papakonstantinou, "DISCOVER: Keyword Search in Relational Databases", VLDB Conf., 2002.
|
| |
21
|
HyTime, http://www.hytime.org.
|
 |
22
|
Guy Jacobson , Balachander Krishnamurthy , Divesh Srivastava , Dan Suciu, Focusing search in hierarchical structures with directory sets, Proceedings of the seventh international conference on Information and knowledge management, p.1-9, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288635]
|
 |
23
|
H. V. Jagadish , Laks V. S. Lakshmanan , Tova Milo , Divesh Srivastava , Dimitra Vista, Querying network directories, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.133-144, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
24
|
|
 |
25
|
Yong Kyu Lee , Seong-Joon Yoo , Kyoungro Yoon , P. Bruce Berra, Index structures for structured documents, Proceedings of the first ACM international conference on Digital libraries, p.91-99, March 20-23, 1996, Bethesda, Maryland, United States
[doi> 10.1145/226931.226950]
|
| |
26
|
R. Luk, et al., "A Survey of Search Engines for XML Documents", SIGIR Workshop on XML and IR, 2000.
|
 |
27
|
Sung Hyon Myaeng , Don-Hyun Jang , Mun-Seok Kim , Zong-Cheol Zhoo, A flexible model for retrieval of SGML documents, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.138-145, August 24-28, 1998, Melbourne, Australia
[doi> 10.1145/290941.290980]
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
A. R. Schmidt , Florian Waas , Martin L. Kersten , D. Florescu , I. Manolescu , M. J. Carey , R. Busse, The XML benchmark project, CWI (Centre for Mathematics and Computer Science), Amsterdam, The Netherlands, 2001
|
 |
32
|
Igor Tatarinov , Stratis D. Viglas , Kevin Beyer , Jayavel Shanmugasundaram , Eugene Shekita , Chun Zhang, Storing and querying ordered XML using a relational database system, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564715]
|
| |
33
|
|
 |
34
|
Anthony Tomasic , Héctor García-Molina , Kurt Shoens, Incremental updates of inverted lists for text document retrieval, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.289-300, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
35
|
World Wide Web Consortium, http://www.w3.org.
|
CITED BY 101
|
|
|
|
|
|
|
|
Beverly Yang , Marcus Fontoura , Eugene Shekita , Sridhar Rajagopalan , Kevin Beyer, Virtual cursors for XML joins, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Varun Kacholia , Shashank Pandit , Soumen Chakrabarti , S. Sudarshan , Rushi Desai , Hrishikesh Karambelkar, Bidirectional expansion for keyword search on graph databases, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
Sihem Amer-Yahia , Nick Koudas , Amélie Marian , Divesh Srivastava , David Toman, Structure and content scoring for XML, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alon Halevy , Michael Franklin , David Maier, Principles of dataspace systems, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.1-9, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
Guoliang Li , Beng Chin Ooi , Jianhua Feng , Jianyong Wang , Lizhu Zhou, EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fang Liu , Clement Yu , Weiyi Meng , Abdur Chowdhury, Effective keyword search in relational databases, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Filipe Mesquita , Altigran S. da Silva , Edleno S. de Moura , Pável Calado , Alberto H. F. Laender, LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces, Information Processing and Management: an International Journal, v.43 n.4, p.983-1004, July, 2007
|
|
|
|
|
|
|
|
|
|
|
|
Sihem Amer-Yahia , Mary Fernández , Divesh Srivastava , Yu Xu, Phrase Matching in XML, Proceedings of the 29th international conference on Very large data bases, p.177-188, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
Jens Graupmann , Michael Biwer , Christian Zimmer , Patrick Zimmer , Matthias Bender , Martin Theobald , Gerhard Weikum, COMPASS: a concept-based web search engine for HTML, XML, and deep web data, Proceedings of the Thirtieth international conference on Very large data bases, p.1313-1316, August 31-September 03, 2004, Toronto, Canada
|
|
|
Sara Cohen , Jonathan Mamou , Yaron Kanza , Yehoshua Sagiv, XSEarch: a semantic search engine for XML, Proceedings of the 29th international conference on Very large data bases, p.45-56, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
Shankar Pal , Istvan Cseri , Oliver Seeliger , Gideon Schaller , Leo Giakoumakis , Vasili Zolotov, Indexing XML data stored in a relational database, Proceedings of the Thirtieth international conference on Very large data bases, p.1146-1157, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
Jun Fang , Lei Guo , XiaoDong Wang , Liang Chen , Ning Yang , WeiLi Yang, Importance of Entities in Knowledge, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, p.481-484, December 18-22, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lin Xudong , Xu De , Wang Ning, NNQM: a novel non-navigating XML query model, Proceedings of the 7th Conference on 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, p.270-274, September 15-17, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Feng Shao , Lin Guo , Chavdar Botev , Anand Bhaskar , Muthiah Chettiar , Fan Yang , Jayavel Shanmugasundaram, Efficient keyword search over virtual XML views, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
Guoliang Li , Jianhua Feng , Jianyong Wang , Bei Yu , Yukai He, Race: finding and ranking compact connected trees for keyword proximity search over xml documents, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M S. Ali , Mariano P. Consens , Gabriella Kazai , Mounia Lalmas, Structural relevance: a common basis for the evaluation of structured document retrieval, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Zachary G. Ives , Todd J. Green , Grigoris Karvounarakis , Nicholas E. Taylor , Val Tannen , Partha Pratim Talukdar , Marie Jacob , Fernando Pereira, The ORCHESTRA Collaborative Data Sharing System, ACM SIGMOD Record, v.37 n.3, September 2008
|
|
|
|
|
|
|
|
|
|
|
|
Partha Pratim Talukdar , Marie Jacob , Muhammad Salman Mehmood , Koby Crammer , Zachary G. Ives , Fernando Pereira , Sudipto Guha, Learning to create data-integrating queries, Proceedings of the VLDB Endowment, v.1 n.1, August 2008
|
|
|
Guoliang Li , Shengyue Ji , Chen Li , Jianhua Feng, Efficient type-ahead search on relational data: a TASTIER approach, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
Feng Shao , Lin Guo , Chavdar Botev , Anand Bhaskar , Muthiah Chettiar , Fan Yang , Jayavel Shanmugasundaram, Efficient keyword search over virtual XML views, The VLDB Journal — The International Journal on Very Large Data Bases, v.18 n.2, p.543-570, April 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yi Chen , Wei Wang , Ziyang Liu , Xuemin Lin, Keyword search on structured and semi-structured data, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|