| PEBL: positive example based learning for Web page classification using SVM |
| Full text |
Pdf
(1.01 MB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Edmonton, Alberta, Canada
SESSION: Web page classification
table of contents
Pages: 239 - 248
Year of Publication: 2002
ISBN:1-58113-567-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 144, Citation Count: 22
|
|
|
ABSTRACT
Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of non-homepages (negative examples). In particular, collecting negative training examples requires arduous work and special caution to avoid biasing them. We introduce in this paper the Positive Example Based Learning (PEBL) framework for Web page classification which eliminates the need for manually collecting negative training examples in pre-processing. We present an algorithm called Mapping-Convergence (M-C) that achieves classification accuracy (with positive and unlabeled data) as high as that of traditional SVM (with positive and negative data). Our experiments show that when the M-C algorithm uses the same amount of positive examples as that of traditional SVM, the M-C algorithm performs as well as traditional SVM.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
H. Chen, C. Schuffels, and R. Orwig. Interuet categorization and search: a machine learning approach. Journal of Visual Communications and Image Representation, 7(1):88--102, 1996.
|
| |
3
|
|
| |
4
|
Mark Craven , Dan DiPasquo , Dayne Freitag , Andrew McCallum , Tom Mitchell , Kamal Nigam , Seán Slattery, Learning to extract symbolic knowledge from the World Wide Web, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.509-516, July 1998, Madison, Wisconsin, United States
|
| |
5
|
|
| |
6
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39:1--38, 1977.
|
| |
7
|
|
 |
8
|
|
| |
9
|
Eric J. Glover , Gary W. Flake , Steve Lawrence , Andries Kruger , David M. Pennock , William P. Birmingham , C. Lee Giles, Improving Category Specific Web Search by Learning Query Modifications, Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001), p.23, January 08-12, 2001
|
| |
10
|
|
| |
11
|
|
 |
12
|
A. Kruger , C. L. Giles , F. M. Coetzee , E. Glover , G. W. Flake , S. Lawrence , C. Omlin, DEADLINER: building a new niche search engine, Proceedings of the ninth international conference on Information and knowledge management, p.272-281, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354829]
|
| |
13
|
|
| |
14
|
|
| |
15
|
H. Mase. Experiments on automatic web page categorization for ir system. Technical report, Stanford University, http://citeseer.nj.nec.com/164846.html, 1998.
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
W.-C. Wong and A. W.-C. Fu. Finding structure and characteristics of web documents for classification. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.
|
 |
21
|
|
 |
22
|
|
CITED BY 23
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bing Liu , Xiaoli Li , Wee Sun Lee , Philip S. Yu, Text classification by labeling words, Proceedings of the 19th national conference on Artifical intelligence, p.425-430, July 25-29, 2004, San Jose, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|