| Learning probabilistic models of the Web (poster session) |
| Full text |
Pdf
(378 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Athens, Greece
Pages: 369 - 371
Year of Publication: 2000
ISBN:1-58113-226-3
|
|
Author
|
|
Thomas Hofmann
|
Department of Computer Science, Box 1910, Brown University, Providence, RI
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 30, Citation Count: 3
|
|
|
ABSTRACT
In the World Wide Web, myriads of hyperlinks connect documents and pages to create an unprecedented, highly complex graph structure - the Web graph. This paper presents a novel approach to learning probabilistic models of the Web, which can be used to make reliable predictions about connectivity and information content of Web documents. The proposed method is a probabilistic dimension reduction technique which recasts and unites Latent Semantic Analysis and Kleinberg's Hubs-and-Authorities algorithm in a statistical setting.
This meant to be a first step towards the development of a statistical foundation for Web—related information technologies. Although this paper does not focus on a particular application, a variety of algorithms operating in the Web/Internet environment can take advantage of the presented techniques, including search engines, Web crawlers, and information agent systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391-407, 1990.
|
| |
4
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B, 39:1-38, 1977.
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
L. Saul and F. Pereira. Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of the 2nd International Conference on Empirical Methods in Natural Language Processing, pages 81-89. 1997.
|
CITED BY 3
|
|
|
|
|
|
Allan Borodin , Gareth O. Roberts , Jeffrey S. Rosenthal , Panayiotis Tsaparas, Link analysis ranking: algorithms, theory, and experiments, ACM Transactions on Internet Technology (TOIT), v.5 n.1, p.231-297, February 2005
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|