ACM Home Page
Please provide us with feedback. Feedback
Why spectral retrieval works
Full text PdfPdf (209 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
SESSION: Theory 1 table of contents
Pages: 11 - 18  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Holger Bast  Max-Planck-Institut für Informatik, Saarbrücken, Germany
Debapriyo Majumdar  Max-Planck-Institut für Informatik, Saarbrücken, Germany
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 82,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076040
What is a DOI?

ABSTRACT

We argue that the ability to identify pairs of related terms is at the heart of what makes spectral retrieval work in practice. Schemes such as latent semantic indexing (LSI) and its descendants have this ability in the sense that they can be viewed as computing a matrix of term-term relatedness scores which is then used to expand the given documents (not the queries). For almost all existing spectral retrieval schemes, this matrix of relatedness scores depends on a fixed low-dimensional subspace of the original term space. We instead vary the dimension and study for each term pair the resultin curve of relatedness scores. We find that it is actually the shape of this curve which is indicative for the term-pair relatedness, and not any of the individual relatedness scores on the curve. We derive two simple, parameterless algorithms that detect this shape and that consistently outperform previous methods on a number of test collections. Our curves also shed light on the effectiveness of three fundamental types of variations of the basic LSI scheme.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
5
6
 
7
G. Dupret. Latent semantic indexing with a variable number of orthogonal factors. In Proceedings RIAO'04, 2004.
 
8
 
9
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins, third edition, 1996.
 
10
 
11
 
12
 
13
S. D. Kamvar, D. Klein, and C. D. Manning. Spectral learning. In Proceedings IJCAI'03, 2003.
 
14
A. Kontostathis and W. M. Pottenger. Detecting patterns in the LSI term-term matrix. In Proceedings ICDM'02 Workshop on Foundations of Data Mining and Discovery, 2002.
 
15
D. Lewis. Reuters-21578 text categorization test collection, 1997.
 
16
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings NIPS'01, 2001.
17
18
19

CITED BY  8

Collaborative Colleagues:
Holger Bast: colleagues
Debapriyo Majumdar: colleagues