|
ABSTRACT
Word mismatch represents a fundamental information retrieval challenge that has become increasingly important as electronic document repositories (e.g., Web resources, digital libraries) grow in number and sheer volume. In general, word mismatch refers to the phenomenon in which a concept is described by different terms in user queries and in source documents. Query expansion represents a promising avenue to address such problems. Previous research predominantly approaches query expansion on the basis of global or local analysis. However, these approaches emphasize a global perspective rather than taking a topic-specific view of term associations. As a consequence, their effectiveness can be severely constrained when the document corpus spans a diverse set of topics. In this study, we propose a topic-based approach for query expansion and develop and empirically evaluate two novel methods-namely, nonfuzzy and fuzzy topic-based query expansion-to address word mismatch problems. According to our evaluation results, the proposed topic-based approach is more effective than a benchmark global analysis method, particularly when user queries consist of multiple query terms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
<!{CDATA{Anderberg, M. R Cluster Analysis for Applications. New York: Academic Press, 1973.}}>
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
Daniel Boley , Maria Gini , Robert Gross , Eui-Hong Han , George Karypis , Vipin Kumar , Bamshad Mobasher , Jerome Moore , Kyle Hastings, Partitioning-based clustering for Web document categorization, Decision Support Systems, v.27 n.3, p.329-341, Dec.1999
[doi> 10.1016/S0167-9236(99)00055-X]
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
<!{CDATA{Croft, W. B., and Harper, D. J. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35, 4 (1979), 285-295.}}>
|
| |
10
|
<!{CDATA{Croft, W. B.; Cook, R.; and Wilder, R. Providing government information on the Internet: Experiences with THOMAS. In R. Furuta (ed.), Proceedings of the Second International Conference on Theory and Practice of Digital Libraries. College Station: Hypermedia Research Lab, Computer Science Department, Texas A&M University, 1995, pp. 19-24.}}>>
|
| |
11
|
|
 |
12
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
13
|
<!{CDATA{Deerwester, S.; Dumais, S. T.; Furnas, G. W.; Landauer, T. K.; and Harshman, R. A. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 6 (1990), 391-407.}}>
|
 |
14
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
| |
15
|
<!{CDATA{Dunn, J. C. A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. Journal of Cybernetics, 3, 3 (1973), 32-57.}}>
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
<!{CDATA{Jing, Y., and Croft, W. B. An association thesaurus for information retrieval. In F. Bretano and F. Seitz (eds.), Proceedings of the Intelligent Multimedia Information Retrieval Systems and Management Conference (RIAO '94). Paris: Centre de Hautes Etudes Internationales d'Informatique Documentaire (CID), 1994, pp. 146-160.}}>
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
<!{CDATA{Kraft, D. H.; Chen, J.; and Mikulcic, A. Combining fuzzy clustering and fuzzy inferencing in information retrieval. In Proceedings of the Ninth IEEE International Conference on Fuzzy Systems. Los Alamitos, CA: IEEE Computer Society Press, 2000, pp. 375-380.}}>
|
| |
31
|
<!{CDATA{Lagus, K.; Honkela, T.; Kaski, S.; and Kohonen, T. Self-organizing maps of document collections: A new approach to interactive exploration. In E. Simoudis, J. Han, and U. Fayyad (eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 1996, pp. 238-243.}}>
|
 |
32
|
|
| |
33
|
|
 |
34
|
|
| |
35
|
<!{CDATA{Mendes, M. E. S., and Sacks, L. Evaluating fuzzy clustering for relevance-based information access. In O. Nasraoui, H. Frigui, and J.M. Keller (eds.), Proceedings of the Twelfth IEEE International Conference on Fuzzy Systems. Los Alamitos, CA: IEEE Computer Society Press, 2003, pp. 648-653.}}>
|
 |
36
|
|
 |
37
|
|
 |
38
|
|
 |
39
|
|
| |
40
|
<!{CDATA{Raghavan, V. V., and Wong, S. K. A critical analysis of vector space model for information retrieval. Journal of the American Society for Information Science, 37, 5 (1986), 279-287.}}>
|
 |
41
|
|
| |
42
|
<!{CDATA{Rocchio, J. Relevance feedback in information retrieval. In G. Salton (ed.), The SMART Retrieval System: Experiments in Automatic Document Processing. Englewood Cliffs, NJ: Prentice Hall, 1971, pp. 313-323.}}>
|
| |
43
|
|
| |
44
|
<!{CDATA{Salton, G., and Buckley, C. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41, 4 (1990), 288-297.}}>
|
| |
45
|
<!{CDATA{Sparck Jones, K. Automatic Keyword Classification for Information Retrieval. London: Butterworths, 1971.}}>
|
| |
46
|
<!{CDATA{Tanaka, H.; Kumano, T.; Uratani, N.; and Ehara, T. An efficient document clustering algorithm and its application to a document browser. Information Processing and Management, 35, 4 (1999), 541-557.}}>
|
| |
47
|
|
| |
48
|
|
| |
49
|
<!{CDATA{Voutilainen, A. NPtool: A detector of English noun phrases. In K.W. Church (ed.), Proceedings of the First Workshop on Very Large Corpora. East Stroudsburg, PA: Association for Computational Linguistics, 1993, pp. 48-57.}}>
|
| |
50
|
|
| |
51
|
<!{CDATA{Wei, C.; Hu, P.; and Dong, Y. X. Managing document categories in e-commerce environments: An evolution-based approach. European Journal of Information Systems, 11, 3 (September 2002), 208-222.}}>
|
| |
52
|
|
| |
53
|
<!{CDATA{Wong, S. K., and Yao, Y. Y. An information-theoretic measure of term specificity. Journal of the American Society for Information Science, 43, 1 (1992), 54-61.}}>
|
 |
54
|
|
 |
55
|
|
| |
56
|
|
|