| Web classification using support vector machine |
| Full text |
Pdf
(327 KB)
|
| Source
|
Workshop On Web Information And Data Management
archive
Proceedings of the 4th international workshop on Web information and data management
table of contents
McLean, Virginia, USA
SESSION: Web mining, tools, and performance evaluation
table of contents
Pages: 96 - 99
Year of Publication: 2002
ISBN:1-58113-593-9
|
|
Authors
|
|
Aixin Sun
|
Nanyang Technological University, Singapore
|
|
Ee-Peng Lim
|
Nanyang Technological University, Singapore
|
|
Wee-Keong Ng
|
Nanyang Technological University, Singapore
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 181, Citation Count: 24
|
|
|
ABSTRACT
In web classification, web pages from one or more web sites are assigned to pre-defined categories according to their content. Since web pages are more than just plain text documents, web classification methods have to consider using other context features of web pages, such as hyperlinks and HTML tags. In this paper, we propose the use of Support Vector Machine (SVM) classifiers to classify web pages using both their text and context feature sets. We have experimented our web classification method on the WebKB data set. Compared with earlier Foil-Pilfs method on the same data set, our method has been shown to perform very well. We have also shown that the use of context features especially hyperlinks can improve the classification performance significantly.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
2
|
|
 |
3
|
|
 |
4
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
| |
5
|
|
| |
6
|
|
| |
7
|
L. Getoor, E. Segal, B. Taskar, and D. Koller. Probabilistic models of text and link structure for hypertext classification. In Proc. of the Int. Joint Conf. on Artificial intelligence Workshop on Text Learning: Beyond Supervision, Seattle, WA, Aug 2001.
|
 |
8
|
Eric J. Glover , Kostas Tsioutsiouliklis , Steve Lawrence , David M. Pennock , Gary W. Flake, Using web structure for classifying and describing web pages, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511520]
|
| |
9
|
T. Joachims. SVM light, An implementation of Support Vector Machines (SVMs) in C. http://svmlight.joachims.org/.
|
| |
10
|
|
| |
11
|
D. D. Lewis. Applying support vector machines to the TREC-2001 batch filtering and routing tasks. In Proc. of the TREC2001, Gaithersburg, Maryland, Nov 2001.
|
| |
12
|
D. Mladenic. Turning Yahoo to automatic web-page classifier. In Proc. of the 13th European Conf. on Artificial Intelligence, pages 473--474, Brighton, UK, Aug 1998.
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
CITED BY 24
|
|
|
|
|
|
|
|
Pável Calado , Marco Cristo , Edleno Moura , Nivio Ziviani , Berthier Ribeiro-Neto , Marcos André Gonçalves, Combining link-based and content-based methods for web document classification, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
|
|
|
|
|
|
Baoping Zhang , Yuxin Chen , Weiguo Fan , Edward A. Fox , Marcos Gonçalves , Marco Cristo , Pável Calado, Intelligent GP fusion from multiple sources for text classification, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
Thierson Couto , Marco Cristo , Marcos André Gonçalves , Pável Calado , Nivio Ziviani , Edleno Moura , Berthier Ribeiro-Neto, A comparative study of citations and links in document classification, Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, June 11-15, 2006, Chapel Hill, NC, USA
|
|
|
Jian-Tao Sun , Ben-Yu Zhang , Zheng Chen , Yu-Chang Lu , Chun-Yi Shi , Wei-Ying Ma, GE-CKO: A Method to Optimize Composite Kernels for Web Page Classification, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, p.299-305, September 20-24, 2004
|
|
|
|
|
|
|
|
|
Adriano Veloso , Wagner Meira, Jr. , Marco Cristo , Marcos Gonçalves , Mohammed Zaki, Multi-evidence, multi-criteria, lazy associative document classification, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B. Barla Cambazoglu , Evren Karaca , Tayfun Kucukyilmaz , Ata Turk , Cevdet Aykanat, Architecture of a grid-enabled Web search engine, Information Processing and Management: an International Journal, v.43 n.3, p.609-623, May, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|