| Why spectral retrieval works |
| Full text |
Pdf
(209 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Salvador, Brazil
SESSION: Theory 1
table of contents
Pages: 11 - 18
Year of Publication: 2005
ISBN:1-59593-034-5
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 82, Citation Count: 8
|
|
|
ABSTRACT
We argue that the ability to identify pairs of related terms is at the heart of what makes spectral retrieval work in practice. Schemes such as latent semantic indexing (LSI) and its descendants have this ability in the sense that they can be viewed as computing a matrix of term-term relatedness scores which is then used to expand the given documents (not the queries). For almost all existing spectral retrieval schemes, this matrix of relatedness scores depends on a fixed low-dimensional subspace of the original term space. We instead vary the dimension and study for each term pair the resultin curve of relatedness scores. We find that it is actually the shape of this curve which is indicative for the term-pair relatedness, and not any of the individual relatedness scores on the curve. We derive two simple, parameterless algorithms that detect this shape and that consistently outperform previous methods on a number of test collections. Our curves also shed light on the effectiveness of three fundamental types of variations of the basic LSI scheme.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Yossi Azar , Amos Fiat , Anna Karlin , Frank McSherry , Jared Saia, Spectral analysis of data, Proceedings of the thirty-third annual ACM symposium on Theory of computing, p.619-626, July 2001, Hersonissos, Greece
[doi> 10.1145/380752.380859]
|
| |
4
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
 |
5
|
|
 |
6
|
|
| |
7
|
G. Dupret. Latent semantic indexing with a variable number of orthogonal factors. In Proceedings RIAO'04, 2004.
|
| |
8
|
|
| |
9
|
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins, third edition, 1996.
|
| |
10
|
William Hersh , Chris Buckley , T. J. Leone , David Hickam, OHSUMED: an interactive retrieval evaluation and new large test collection for research, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, p.192-201, July 03-06, 1994, Dublin, Ireland
|
| |
11
|
|
| |
12
|
|
| |
13
|
S. D. Kamvar, D. Klein, and C. D. Manning. Spectral learning. In Proceedings IJCAI'03, 2003.
|
| |
14
|
A. Kontostathis and W. M. Pottenger. Detecting patterns in the LSI term-term matrix. In Proceedings ICDM'02 Workshop on Foundations of Data Mining and Discovery, 2002.
|
| |
15
|
D. Lewis. Reuters-21578 text categorization test collection, 1997.
|
| |
16
|
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings NIPS'01, 2001.
|
 |
17
|
Christos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala, Latent semantic indexing: a probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159-168, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275505]
|
 |
18
|
|
 |
19
|
|
CITED BY 8
|
|
|
|
|
Xuanhui Wang , Jian-Tao Sun , Zheng Chen , ChengXiang Zhai, Latent semantic analysis for multiple-type interrelated data objects, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|