ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
PEBL: positive example based learning for Web page classification using SVM
Full text PdfPdf (1.01 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Web page classification table of contents
Pages: 239 - 248  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Hwanjo Yu  University of Illinois, Urbana-Champaign, IL
Jiawei Han  University of Illinois, Urbana-Champaign, IL
Kevin Chen-Chuan Chang  University of Illinois, Urbana-Champaign, IL
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 127,   Citation Count: 27
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775083
What is a DOI?

ABSTRACT

Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of non-homepages (negative examples). In particular, collecting negative training examples requires arduous work and special caution to avoid biasing them. We introduce in this paper the Positive Example Based Learning (PEBL) framework for Web page classification which eliminates the need for manually collecting negative training examples in pre-processing. We present an algorithm called Mapping-Convergence (M-C) that achieves classification accuracy (with positive and unlabeled data) as high as that of traditional SVM (with positive and negative data). Our experiments show that when the M-C algorithm uses the same amount of positive examples as that of traditional SVM, the M-C algorithm performs as well as traditional SVM.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
H. Chen, C. Schuffels, and R. Orwig. Interuet categorization and search: a machine learning approach. Journal of Visual Communications and Image Representation, 7(1):88--102, 1996.
 
3
 
4
 
5
 
6
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39:1--38, 1977.
 
7
8
 
9
 
10
 
11
12
 
13
 
14
 
15
H. Mase. Experiments on automatic web page categorization for ir system. Technical report, Stanford University, http://citeseer.nj.nec.com/164846.html, 1998.
 
16
 
17
18
 
19
 
20
W.-C. Wong and A. W.-C. Fu. Finding structure and characteristics of web documents for classification. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.
21
22

CITED BY  28

Collaborative Colleagues:
Hwanjo Yu: colleagues
Jiawei Han: colleagues
Kevin Chen-Chuan Chang: colleagues