ACM Home Page
Please provide us with feedback. Feedback
An experimental study on large-scale web categorization
Full text PdfPdf (211 KB)
Source International World Wide Web Conference archive
Special interest tracks and posters of the 14th international conference on World Wide Web table of contents
Chiba, Japan
POSTER SESSION: Posters table of contents
Pages: 1106 - 1107  
Year of Publication: 2005
ISBN:1-59593-051-5
Authors
Tie-Yan LIU  Microsoft Research Asia, Beijing, P. R. China
Yiming YANG  Carnegie Mellon University, PA
Hao WAN  Tsinghua University, Beijing, P. R. China
Qian ZHOU  Tsinghua University, Beijing, P. R. China
Bin GAO  Peking University, Beijing, P. R. China
Hua-Jun ZENG  Microsoft Research Asia, Beijing, P. R. China
Zheng CHEN  Microsoft Research Asia, Beijing, P. R. China
Wei-Ying MA  Microsoft Research Asia, Beijing, P. R. China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 51,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1062745.1062891
What is a DOI?

ABSTRACT

Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification technologies can perform well on and scale up to such large-scale applications. To understand this, we conducted the evaluation of several representative methods (Support Vector Machines, k-Nearest Neighbor and Naive Bayes) with Yahoo! taxonomies. In particular, we evaluated the effectiveness/efficiency tradeoff in classifiers with hierarchical setting compared to conventional (flat) setting, and tested popular threshold tuning strategies for their scalability and accuracy in large-scale classification problems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Attardi, G., Gulli, A., Sebastiani, F., Automatic Web Page Categorization by Link and Context Analysis, THAI 1999.
 
2
 
3
 
4
Mladenic, D., Grobelnik, M., Word sequences as features in text-learning. ERK 1998, 145--148.
5
6
7


Collaborative Colleagues:
Tie-Yan LIU: colleagues
Yiming YANG: colleagues
Hao WAN: colleagues
Qian ZHOU: colleagues
Bin GAO: colleagues
Hua-Jun ZENG: colleagues
Zheng CHEN: colleagues
Wei-Ying MA: colleagues