|
ABSTRACT
Techniques for automatic query expansion have been extensively studied in information research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top-ranked documents retrieved for a query. While local techniques have shown to be more effective that global techniques in general, existing local techniques are not robust and can seriously hurt retrieved when few of the retrieval documents are relevant. We propose a new technique, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ALLAN, J., CALLAN, J., CROFT, W., BALLESTEROS, L., BYRD, D., SWAN, R., AND XU, J. 1998. INQUERY does battle with TREC-6. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 169-206. NIST Special Publication 500-240.
|
 |
2
|
|
 |
3
|
|
| |
4
|
BROGLIO, J., CALLAN, J. P., AND CROFT, W. 1994. An overview of the INQUERY system as used for the TIPSTER project. In Proceedings of the TIPSTER Workshop, Morgan Kaufmann, San Mateo, CA, 47-67.
|
| |
5
|
BROGLIO, J., CALLAN, J. P., CROFT, W. B., AND NACHBAR, D.W. 1995. Document retrieval and routing using the INQUERY system. In Proceedings of the 3rd Text Retrieval Conference (TREC-3), D. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 22-29.
|
| |
6
|
BUCKLEY, C., MITRA, M., WALZ, J., AND CARDIE, C. 1998. Using clustering and superconcepts within SMART. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 107-124. NIST Special Publication 500-240.
|
| |
7
|
BUCKLEY, C., SALTON, G., ALAN, J., AND SINGHAL, A. 1995a. Automatic query expansion using SMART. In Proceedings of the 3rd Text Retrieval Conference (TREC-3), D. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 69-80.
|
| |
8
|
BUCKLEY, C., SINGHAL, A., MITRA, M., AND SALTON, G. 1995b. New retrieval approaches using SMART. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 25-48.
|
| |
9
|
|
| |
10
|
Kenneth Ward Church , Patrick Hanks, Word association norms, mutual information, and lexicography, Proceedings of the 27th annual meeting on Association for Computational Linguistics, p.76-83, June 26-29, 1989, Vancouver, British Columbia, Canada
[doi> 10.3115/981623.981633]
|
| |
11
|
CROFT, W. AND HARPER, D.J. 1979. Using probabilistic models of document retrieval without relevance information. J. Doc. 35, 285-295.
|
| |
12
|
CROFT, W. B., COOK, R., AND WILDER, D. 1995. Providing government information on the Internet: Experiences with THOMAS. In Proceedings of the 2nd International Conference on Theory and Practice of Digital Libraries (DL '95, Austin, TX, June), 19-24.
|
| |
13
|
DEERWESTER, S., DUMAI, S. T., FURNAS, G. W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6, 391-407.
|
| |
14
|
|
 |
15
|
G. W. Furnas , S. Deerwester , S. T. Dumais , T. K. Landauer , R. A. Harshman , L. A. Streeter , K. E. Lochbaum, Information retrieval using a singular value decomposition model of latent semantic structure, Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, p.465-480, May 1988, Grenoble, France
[doi> 10.1145/62437.62487]
|
 |
16
|
|
 |
17
|
|
| |
18
|
HAWKING, D., THISTLEWAITE, P., AND CRASWELL, N. 1998. ANU/ACSys TREC-6 experiments. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 275-290. NIST Special Publication 500-240.
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
JING, Y. AND CROFT, W. B. 1994. An association thesaurus for information retrieval. In Proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO '94, New York, NY), 146-160.
|
 |
23
|
|
| |
24
|
KWOK, K. L., GRUNFELD, L., AND XU, J. 1998. TREC-6 English and Chinese experiments using PIRCS. In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 207-214. NIST Special Publication 500-240.
|
| |
25
|
Lu, A., AYOUB, M., AND DONG, J. 1997. Ad hoc experiments using EUREKA. In Proceedings of the 5th Text Retrieval Conference, 229-240. NIST Special Pub 500-238.
|
| |
26
|
MINKER, J., WILSON, G., AND ZIMMERMAN, B. 1972. An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Inf. Storage Retrieval 8, 329-348.
|
 |
27
|
|
 |
28
|
|
| |
29
|
PONTE, g. AND CROFT, B. 1996. USeg: A retargetable word segmentation procedure for information retrieval. In Proceedings of the Symposium on Document Analysis and Information Retrieval,
|
| |
30
|
|
 |
31
|
|
| |
32
|
ROCCHIO, J. 1971. Relevance feedback in information retrieval. In The Smart Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed. Prentice-Hall, Englewood Cliffs, NJ, 313-323.
|
| |
33
|
|
| |
34
|
SALTON, G. AND BUCKLEY, C. 1990. Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41, 4, 288-297.
|
| |
35
|
SCH TZE, H. AND PEDERSEN, g. 1994. A cooccurrence-based thesaurus and two applications to information retrieval. In Proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO '94, New York, NY), 266-274.
|
 |
36
|
|
| |
37
|
SPARCK JONES, K. 1971. Automatic Keyword Classification for Information Retrieval. Butterworths, London, UK.
|
| |
38
|
|
| |
39
|
VOORHEES, E. AND HARMAN, D. 1998. Overview of the Sixth Text Retrieval Conference (TREC-6). In Proceedings of the 6th Text Retrieval Conference (TREC-6), E. Voorhees, Ed. 1-24. NIST Special Publication 500-240.
|
| |
40
|
WALKER, S., ROBERTSON, S., BOUGHANEM, M., JONES, G., AND JONES, K. S. 1997. Okapi at TREC-6 automatic ad hoc, VLC, routing, filtering and QSDR. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds. 125-136.
|
| |
41
|
WILKINSON, R., ZOBEL, J., AND SACKS-DAVIS, R. 1996. Similarity measures for short queries. In Proceedings of the 4th Text Retrieval Conference, D. Harman, Ed. 277-286. NIST Special Publication 500-236.
|
| |
42
|
|
 |
43
|
|
 |
44
|
|
| |
45
|
|
CITED BY 85
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hang Cui , Ji-Rong Wen , Jian-Yun Nie , Wei-Ying Ma, Probabilistic query expansion using query logs, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
|
|
|
|
|
|
Qiankun Zhao , Steven C. H. Hoi , Tie-Yan Liu , Sourav S. Bhowmick , Michael R. Lyu , Wei-Ying Ma, Time-dependent semantic similarity measure of queries using historical click-through data, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
|
|
|
|
|
|
|
|
|
Mark Maybury , Warren Greiff , Stanley Boykin , Jay Ponte , Chad Mchenry , Lisa Ferro, Personalcasting: Tailored Broadcast News, User Modeling and User-Adapted Interaction, v.14 n.1, p.119-144, February 2004
|
|
|
Dmitri Roussinov , Leon J. Zhao , Weiguo Fan, Mining context specific similarity relationships using the world wide web, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.499-506, October 06-08, 2005, Vancouver, British Columbia, Canada
|
|
|
Min Song , Il Yeol Song , Robert B. Allen , Zoran Obradovic, Keyphrase extraction-based query expansion in digital libraries, Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, June 11-15, 2006, Chapel Hill, NC, USA
|
|
|
|
|
|
|
|
|
Lingpeng Yang , Donghong Ji , Guodong Zhou , Yu Nie , Guozheng Xiao, Document re-ranking using cluster validation and label propagation, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Z. Broder , Marcus Fontoura , Evgeniy Gabrilovich , Amruta Joshi , Vanja Josifovski , Tong Zhang, Robust classification of rare queries using web knowledge, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Filip Radlinski , Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Optimizing relevance and revenue in ad search: a query substitution approach, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Victor Lavrenko , James Allan , Edward DeGuzman , Daniel LaFlamme , Veera Pollard , Stephen Thomas, Relevance models for topic detection and tracking, Proceedings of the second international conference on Human Language Technology Research, March 24-27, 2002, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francesco Bonchi , Carlos Castillo , Debora Donato , Aristides Gionis, Topical query decomposition, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Lance Riedel , Jeffrey Yuan, Online expansion of rare queries for sponsored search, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
Evgeniy Gabrilovich , Andrei Broder , Marcus Fontoura , Amruta Joshi , Vanja Josifovski , Lance Riedel , Tong Zhang, Classifying search queries using the Web as a source of knowledge, ACM Transactions on the Web (TWEB), v.3 n.2, p.1-28, April 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Karen Sparck-Jones : Reviewer"
This good, solid paper addresses the word mismatch problem (that
is, different words for a single concept) with query expansion, using
the local context supplied by top-ranked documents in a presearch to
identify good term associations. This s
more...
|