ACM Home Page
Please provide us with feedback. Feedback
Variable latent semantic indexing
Full text PdfPdf (219 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 13 - 21  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
Anirban Dasgupta  Cornell University, Ithaca, NY
Ravi Kumar  IBM Almaden Research Center, San Jose, CA
Prabhakar Raghavan  Yahoo!, Research Labs, Sunnyvale, CA
Andrew Tomkins  IBM Almaden Research Center, San Jose, CA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 67,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081876
What is a DOI?

ABSTRACT

Latent Semantic Indexing is a classical method to produce optimal low-rank approximations of a term-document matrix. However, in the context of a particular query distribution, the approximation thus produced need not be optimal. We propose VLSI, a new query-dependent (or "variable") low-rank approximation that minimizes approximation error for any specified query distribution. With this tool, it is possible to tailor the LSI technique to particular settings, often resulting in vastly improved approximations at much lower dimensionality. We validate this method via a series of experiments on classical corpora, showing that VLSI typically performs similarly to LSI with an order of magnitude fewer dimensions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Baeza-Yates. Web usage mining in search engines. In A. Scime, editor, Web Mining: Applications and Techniques, chapter XIV. Idea Group, 2004.
 
2
 
3
S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990.
 
4
S. T. Dumais. LSI meets TREC: A status report. In The First Text REtrieval Conference (TREC1), National Institute of Standards and Technology, pages 137--152, 1992.
 
5
S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology, pages 105--116, 1993.
 
6
S. T. Dumais. Latent semantic indexing (LSI): TREC-3 report. In The Third Text REtrieval Conference (TREC3), National Institute of Standards and Technology, pages 105--115, 1994.
7
 
8
C. Eckart and G. Young. The approximation of a matrix by another of lower rank. Psychometrika, 1:211--218, 1936.
 
9
G. H. Golub and C. F. V. Loan. Matrix Computation. John Hopkins University Press, 1991. Second Edition.
 
10
T. Hoffmann. Matrix decomposition techniques in machine learning and information retrieval. http://www.mpi-sb.mpg.de/~adfocs/adfocs04.slides-hofmann.pdf, 2004.
11
12
 
13
I. T. Jolliffe. Principal Component Analysis. Springer, 2002.
 
14
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.
15
 
16
D. D. Lewis. http://www.daviddlewis.com/resources/testcollections/reuters21578/.
 
17
 
18
B. Moghaddam and A. Pentland. Face recognition using view-based and modular eigenspaces. In Automatic Systems for the Identification and Inspection of Humans, SPIE,, volume 2277, pages 12--21, 1994.
 
19
S. E. Robertson and S. Walker. Okapi/Keenbow at TREC-8. In The Eighth Text REtrieval Conference (TREC8), National Institute of Standards and Technology, 1999.
20
 
21
A. Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4):35--43, 2001.
 
22
P. K. C. Singitham, M. S. Mahabhashyam, and P. Raghavan. Efficiency-quality tradeoffs for vector score aggregation. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 624--635, 2004.
 
23
N. Srebro and T. Jaakkola. Weighted low-rank approximations. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 720--727, 2003.
 
24
M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61(3):611--622, 1999.
25

Collaborative Colleagues:
Anirban Dasgupta: colleagues
Ravi Kumar: colleagues
Prabhakar Raghavan: colleagues
Andrew Tomkins: colleagues