ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Data mining for hypertext: a tutorial survey
Full text PdfPdf (1.19 MB)
Source ACM SIGKDD Explorations Newsletter archive
Volume 1 ,  Issue 2  (January 2000) table of contents
COLUMN: Survey articles table of contents
Pages: 1 - 11  
Year of Publication: 2000
ISSN:1931-0145
Author
Soumen Chakrabarti  Indian Institute of Technology Bombay
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 186,   Citation Count: 28
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/846183.846187
What is a DOI?

ABSTRACT

With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile ground for data mining research to make a difference to the effectiveness of information search. Today, Web surfers access the Web through two dominant interfaces: clicking on hyperlinks and searching via keyword queries. This process is often tentative and unsatisfactory. Better support is needed for expressing one's information need and dealing with a search result in more structured ways than available now. Data mining and machine learning have significant roles to play towards this end.In this paper we will survey recent advances in learning and mining problems related to hypertext in general and the Web in particular. We will review the continuum of supervised to semi-supervised to unsupervised learning problems, highlight the specific challenges which distinguish data mining in the hypertext domain from data mining in the context of data warehouses, and summarize the key areas of recent and ongoing research.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
J. Allen. Natural Language Understanding. Benjamin/Cummings, 1987.
 
4
D. J. Arnold, L. Balkan, R. L. Humphreys, S. Meijer, and L. Sadler. Machine translation: An introductory guide, 1995. Online at http://clwww.essex.ac.uk/~doug/book/book.html.
 
5
Babelfish Language Translation Service. http://www.altavista.com, 1998.
 
6
 
7
8
 
9
 
10
 
11
 
12
N. Catenazzi and F. Gibb. The publishing process: the hyperbook approach. Journal of Information Science, 21(3):161--172, 1995.
 
13
 
14
 
15
16
 
17
 
18
 
19
20
21
 
22
 
23
S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990. Online at http://superbook.telcordia.com/~remde/isi/papers/JASIS90.ps.
 
24
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B(39):1--38, 1977.
 
25
 
26
 
27
R. Dude and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
 
28
S. Fong and R. Berwick. Parsing with Principles and Parameters. MIT Press, 1992.
 
29
 
30
 
31
 
32
 
33
R. Goldman, J. McHugh, and J. Widom. From semistructured data to XML: Migrating the Lore data model and query language. In Proceedings of the 2nd International Workshop on the Web and Databases (WebDB '99), pages 25--30, Philadelphia, June 1999. Online at http://www-db.stanford.edu/pub/papers/xml.ps.
 
34
G. H. Golub and C. F. van Loan. Matrix Computations. Johns Hopkins University Press, London, 1989.
 
35
S. G. Green. Building newspaper links in newspaper articles using semantic similarity. In Natural Language and Data Bases Conference, Vancouver, NLDB'97, 1997.
 
36
L. Haegeman. Introduction to Government and Binding Theory. Basil Blackwell Ltd., Oxford, 1991.
37
 
38
 
39
W. J. Hutchins and H. L. Somers. An Introduction to Machine Translation. Academic Press, 1992.
 
40
U. N. U. Institute of Advanced Studies. The universal networking language: Specification document. In Internal Technical Document, 1999.
 
41
 
42
 
43
 
44
D. Koller and M. Sahami. Toward optimal feature selection. In L. Saitta, editor, International Conference on Machine Learning, volume 13. Morgan-Kaufmann, 1996.
 
45
 
46
K. S. Kumarvel. Automatic hypertext creation. M. Tech Thesis, Computer Science and Engineering Department, IIT Bombay, 1997.
 
47
P.-S. Laplace. Philosophical Essays on Probabilities. Springer-Verlag, New York, 1995. Translated by A. I. Dale from the 5th French edition of 1825.
 
48
R. Larson. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Annual Meeting of the American Society for Information Science, 1996. Online at http://sherlock.berkeley.edu/asis96/asis96.html.
 
49
S. Lawrence and C. Lee Giles. Accessibility of information on the web. Nature, 400:107--109, July 1999.
 
50
 
51
A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41--48. AAAI Press, 1998. Also technical report WS-98-05, CMU; online at http://www.cs.cmu.edu/~knigam/papers/multinomial-aaaiws98.pdf.
 
52
 
53
54
 
55
M. S. Mizruchi, P. Mariolis, M. Schwartz, and B. Mintz. Techniques for disaggregating centrality scores in social networks. In N. B. Tuma, editor, Sociological Methodology, pages 26--48. Jossey-Bass, San Francisco, 1986.
 
56
T. K. Moon and W. C. Sterling. Mathematical Methods and Algorithms for Signal Processing. Prentice Hall, 1 edition, Aug. 1999.
 
57
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI'99 Workshop on Information Filtering, 1999. Online at http://www.cs.cmu.edu/~mccallum/papers/maxent-ijcaiws99.ps.gz.
 
58
C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent sematic indexing: A probabilistic analysis. Submitted for publication.
 
59
J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, 1998. Online at http://www.research.microsoft.com/users/jplatt/smoTR.pdf.
 
60
 
61
E. S. Ristad. A natural law of succession. Research report CS-TR-495-95, Princeton University, July 1995.
 
62
 
63
 
64
 
65
R. G. Schank and C. J. Rieger. Inference and computer understanding of natural language. In in Readings in Knowledge Representation, R. J. Brachman and H. J. Levesque (ed.), Morgan Kaufmann Publishers, 1985.
66
 
67
 
68
D. Temperley. An introduction to link grammar parser. Technical report, Apr. 1999. Online at http://www.link.cs.cmu.edu/link/dict/introduction.html.
 
69
V. Vapnik, S. Golowich, and A. J. Smola. Support vector method for function approximation, regression estimation, and signal processing. In Advances in Neural Information Processing Systems. MIT Press, 1996.
 
70
S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press, 1994.
 
71
 
72

CITED BY  27