| Variable latent semantic indexing |
| Full text |
Pdf
(219 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
table of contents
Chicago, Illinois, USA
SESSION: Research track paper
table of contents
Pages: 13 - 21
Year of Publication: 2005
ISBN:1-59593-135-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 67, Citation Count: 0
|
|
|
ABSTRACT
Latent Semantic Indexing is a classical method to produce optimal low-rank approximations of a term-document matrix. However, in the context of a particular query distribution, the approximation thus produced need not be optimal. We propose VLSI, a new query-dependent (or "variable") low-rank approximation that minimizes approximation error for any specified query distribution. With this tool, it is possible to tailor the LSI technique to particular settings, often resulting in vastly improved approximations at much lower dimensionality. We validate this method via a series of experiments on classical corpora, showing that VLSI typically performs similarly to LSI with an order of magnitude fewer dimensions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Baeza-Yates. Web usage mining in search engines. In A. Scime, editor, Web Mining: Applications and Techniques, chapter XIV. Idea Group, 2004.
|
| |
2
|
|
| |
3
|
S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990.
|
| |
4
|
S. T. Dumais. LSI meets TREC: A status report. In The First Text REtrieval Conference (TREC1), National Institute of Standards and Technology, pages 137--152, 1992.
|
| |
5
|
S. T. Dumais. Latent semantic indexing (LSI) and TREC-2. In The Second Text REtrieval Conference (TREC2), National Institute of Standards and Technology, pages 105--116, 1993.
|
| |
6
|
S. T. Dumais. Latent semantic indexing (LSI): TREC-3 report. In The Third Text REtrieval Conference (TREC3), National Institute of Standards and Technology, pages 105--115, 1994.
|
 |
7
|
S. T. Dumais , G. W. Furnas , T. K. Landauer , S. Deerwester , R. Harshman, Using latent semantic analysis to improve access to textual information, Proceedings of the SIGCHI conference on Human factors in computing systems, p.281-285, May 15-19, 1988, Washington, D.C., United States
[doi> 10.1145/57167.57214]
|
| |
8
|
C. Eckart and G. Young. The approximation of a matrix by another of lower rank. Psychometrika, 1:211--218, 1936.
|
| |
9
|
G. H. Golub and C. F. V. Loan. Matrix Computation. John Hopkins University Press, 1991. Second Edition.
|
| |
10
|
T. Hoffmann. Matrix decomposition techniques in machine learning and information retrieval. http://www.mpi-sb.mpg.de/~adfocs/adfocs04.slides-hofmann.pdf, 2004.
|
 |
11
|
|
 |
12
|
|
| |
13
|
I. T. Jolliffe. Principal Component Analysis. Springer, 2002.
|
| |
14
|
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.
|
 |
15
|
|
| |
16
|
D. D. Lewis. http://www.daviddlewis.com/resources/testcollections/reuters21578/.
|
| |
17
|
|
| |
18
|
B. Moghaddam and A. Pentland. Face recognition using view-based and modular eigenspaces. In Automatic Systems for the Identification and Inspection of Humans, SPIE,, volume 2277, pages 12--21, 1994.
|
| |
19
|
S. E. Robertson and S. Walker. Okapi/Keenbow at TREC-8. In The Eighth Text REtrieval Conference (TREC8), National Institute of Standards and Technology, 1999.
|
 |
20
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383959]
|
| |
21
|
A. Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4):35--43, 2001.
|
| |
22
|
P. K. C. Singitham, M. S. Mahabhashyam, and P. Raghavan. Efficiency-quality tradeoffs for vector score aggregation. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 624--635, 2004.
|
| |
23
|
N. Srebro and T. Jaakkola. Weighted low-rank approximations. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 720--727, 2003.
|
| |
24
|
M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61(3):611--622, 1999.
|
 |
25
|
|
|