| An experimental study on large-scale web categorization |
| Full text |
Pdf
(211 KB)
|
| Source
|
International World Wide Web Conference
archive
Special interest tracks and posters of the 14th international conference on World Wide Web
table of contents
Chiba, Japan
POSTER SESSION: Posters
table of contents
Pages: 1106 - 1107
Year of Publication: 2005
ISBN:1-59593-051-5
|
|
Authors
|
|
Tie-Yan LIU
|
Microsoft Research Asia, Beijing, P. R. China
|
|
Yiming YANG
|
Carnegie Mellon University, PA
|
|
Hao WAN
|
Tsinghua University, Beijing, P. R. China
|
|
Qian ZHOU
|
Tsinghua University, Beijing, P. R. China
|
|
Bin GAO
|
Peking University, Beijing, P. R. China
|
|
Hua-Jun ZENG
|
Microsoft Research Asia, Beijing, P. R. China
|
|
Zheng CHEN
|
Microsoft Research Asia, Beijing, P. R. China
|
|
Wei-Ying MA
|
Microsoft Research Asia, Beijing, P. R. China
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 51, Citation Count: 4
|
|
|
ABSTRACT
Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification technologies can perform well on and scale up to such large-scale applications. To understand this, we conducted the evaluation of several representative methods (Support Vector Machines, k-Nearest Neighbor and Naive Bayes) with Yahoo! taxonomies. In particular, we evaluated the effectiveness/efficiency tradeoff in classifiers with hierarchical setting compared to conventional (flat) setting, and tested popular threshold tuning strategies for their scalability and accuracy in large-scale classification problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Attardi, G., Gulli, A., Sebastiani, F., Automatic Web Page Categorization by Link and Context Analysis, THAI 1999.
|
| |
2
|
|
| |
3
|
|
| |
4
|
Mladenic, D., Grobelnik, M., Word sequences as features in text-learning. ERK 1998, 145--148.
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
CITED BY 4
|
|
Bin Gao , Tie-Yan Liu , Guang Feng , Tao Qin , Qian-Sheng Cheng , Wei-Ying Ma, Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning, IEEE Transactions on Knowledge and Data Engineering, v.17 n.9, p.1263-1273, September 2005
|
|
|
|
|
|
|
|
|
|
|