|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Most of the work on XML query and search has stemmed from the publishing and database communities, mostly for the needs of business applications. Recently, the Information Retrieval community began investigating the XML search issue to answer information discovery needs. Following this trend, we present here an approach where information needs can be expressed in an approximate manner as pieces of XML documents or "XML fragments" of the same nature as the documents that are being searched. We present an extension of the vector space model for searching XML collections via XML fragments and ranking results by relevance. We describe how we have extended a full-text search engine to comply with this model. The value of the proposed method is demonstrated by the relative high precision of our system, which was among the top performers in the recent INEX workshop. Our results indicate that certain queries are more appropriate than others for the extended vector space model. Specifically, queries with relatively specific contexts but vague information needs are best situated to reap the benefit of this model. Finally our results show that one method may not fit all types of queries and that it could be worthwhile to use different solutions for different applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
R. Baeza-Yates, D. Carmel, Y. Maarek and A. Soffer (eds), JASIST Special Issue on XML and Information Retrieval, 53: 6, 2002
|
| |
3
|
R. Baeza-Yates, N. Fuhr and Y. Maarek, Second Edition of the XML and IR Workshop, In SIGIR Forum, Volume 36 Number 2, Fall 2002
|
 |
4
|
|
| |
5
|
D. Carmel, E. Amitay, M. Herscovici, Y. Maarek, Y. Petruschka and A. Soffer, "Juru at TREC 10 - Experiments with Index Pruning", in Proceedings of NIST TREC 10, Nov 2001.
|
| |
6
|
D. Carmel, N. Efraty, G. Landau, Y. Maarek, and Y. Mass, "An Extension of the Vector Space Model for Querying XML Documents via XML Fragments", in {3}.
|
 |
7
|
|
| |
8
|
D. Chamberlin, P. Fankhauser, M. Marchiori and J. Robie, XML Query Use Cases, W3C Working Draft 20 Dec 2001, http://www.w3.org/TR/2001/WD-xmlquery-use-cases-20011220
|
 |
9
|
|
| |
10
|
N. Fuhr and K. GrossJohann, "Query Formulation and Results Visualization for XML Retrieval", in {3}.
|
| |
11
|
T. Grabs and H. J. Schek, "Generating Vector Spaces On-the-fly for Flexible XML Retrieval", in {3}.
|
| |
12
|
INEX evaluation software, downloadable from http://ls6-www.cs.uni-dortmund.de/ir/projects/inex/download
|
| |
13
|
Initiative for the evaluation of XML retrieval http://qmir.dcs.qmul.ac.uk/INEX/
|
| |
14
|
|
 |
15
|
Gerard Salton , J. Allan , Chris Buckley, Approaches to passage retrieval in full text information systems, Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, p.49-58, June 27-July 01, 1993, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/160688.160693]
|
 |
16
|
|
| |
17
|
XQuery, the XML Query language, http://www.w3.org/TR/2002/WD-xquery-20020430
|
CITED BY 43
|
|
|
|
|
R. Mack , S. Mukherjea , A. Soffer , N. Uramoto , E. Brown , A. Coden , J. Cooper , A. Inokuchi , B. Iyer , Y. Mass , H. Matsuzawa , L. V. Subramaniam, Text analytics for life science using the unstructured information management architecture, IBM Systems Journal, v.43 n.3, p.490-515, July 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sihem Amer-Yahia , Nick Koudas , Amélie Marian , Divesh Srivastava , David Toman, Structure and content scoring for XML, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eser Kandogan , Rajasekar Krishnamurthy , Sriram Raghavan , Shivakumar Vaithyanathan , Huaiyu Zhu, Avatar semantic search: a database approach to information retrieval, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
Jennifer Chu-Carroll , John Prager , Krzysztof Czuba , David Ferrucci , Pablo Duboue, Semantic search via XML fragments: a high-precision approach to IR, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
Yu Jianjun , Guo Shengmin , Su Hao , Zhang Hui , Xu Ke, A kernel based structure matching for web services search, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Holger Bast , Alexandru Chitea , Fabian Suchanek , Ingmar Weber, ESTER: efficient search on text, entities, and relations, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M S. Ali , Mariano P. Consens , Gabriella Kazai , Mounia Lalmas, Structural relevance: a common basis for the evaluation of structured document retrieval, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|