|
ABSTRACT
Support Vector Machines (SVM) have obtained state-of-the-art results on many applications including document classification. However, previous works on applying SVMs to the F-term patent classification task did not obtain as good results as other learning algorithms such as kNN. This is due to the fact that F-term patent classification is different from conventional document classification in several aspects, mainly because it is a multiclass, multilabel classification problem with semi-structured documents and multi-faceted hierarchical categories. This article describes our SVM-based system and several techniques we developed successfully to adapt SVM for the specific features of the F-term patent classification task. We evaluate the techniques using the NTCIR-6 F-term classification terms assigned to Japanese patents. Moreover, our system participated in the NTCIR-6 patent classification evaluation and obtained the best results according to two of the three metrics used for task performance evaluation. Following the NTCIR-6 participation, we developed two new techniques, which achieved even better scores using all three NTCIR-6 metrics, effectively outperforming all participating systems. This article presents this new work and the experimental results that demonstrate the benefits of the latest approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Cancedda, N., Cesa-Bianchi, N., Conconi, A., Gentile, C., Goutte, C., Graepel, T., Li, Y., Renders, J. M., and Shawe-Taylor, J. 2003. Kernel methods for document filtering. In eds. E. M. Voorhees and L. P. Buckland. In Proceedings of the 11th Text Retrieval Conference (TREC'02).
|
| |
2
|
Cesa-Bianchi, N., Gentile, C., Tironi, A., and Zaniboni, L. 2004. Incremental algorithms for hierarchical classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'04).
|
| |
3
|
|
| |
4
|
Fujino, A. and Isozaki, H. 2007. Multi-label patent classification at NTT Communication Science Laboratories. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 381--384.
|
 |
5
|
|
| |
6
|
Hashimoto, K. and Yukawa, T. 2007. Term weighting classification system using the chi-square statistic for the classification subtask at ntcir-6 patent retrieval task. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 385--389.
|
| |
7
|
Iwayama, M., Fujii, A., and Kando, N. 2005. Overview of classification subtask at NTCIR-5 patent retrieval task. In Proceedings of NTCIR-5 Workshop Meeting.
|
| |
8
|
Iwayama, M., Fujii, A., and Kando, N. 2007. Overview of classification subtask at NTCIR-6 patent retrieval task. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07).
|
| |
9
|
|
| |
10
|
|
| |
11
|
Li, Y., Bontcheva, K., and Cunningham, H. 2007. Cost sensitive evaluation measures for F-term patent classification. In Proceedings of the 1st International Workshop on Evaluating Information Access (EVIA'07), 44--53.
|
| |
12
|
Li, Y., Bontcheva, K., and Cunningham, H. 2007. SVM based learning system for F-term patent classification. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 396--402.
|
| |
13
|
Li, Y. and Shawe-Taylor, J. 2003. The SVM with uneven margins and Chinese document categorization. In Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation (PACLIC17). Singapore, China.
|
| |
14
|
Liu, C.-Y. and Luo, S.-Y. 2007. Investigation of carbon nanotubes using the F-term code of Japanese patent information. Data Science J. 6, (Supplement) S255--S260.
|
| |
15
|
Makita, M., Higuchi, S., Fujii, A., and Ishikawa, T. 2003. A system for Japanese/English/Korean multilingual patent retrieval. In Proceedings of Machine Translation Summit IX. Available online http://www.amtaweb.org/summit/MTSummit/papers.html.
|
| |
16
|
Murata, M., Kanamaru, T., Shirado, T., and Isahara, H. 2007. Using the k-nearest neighbor method and smart weighting in the patent document categorization subtask at ntcir-6. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 407--413.
|
| |
17
|
Rikitoku, M. 2007. F-term classification experiments at NTCIR-6 for justsytems. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 420--427.
|
| |
18
|
Schellner, I. 2002. Japanese file index classification and f-terms. World Patent Inform. 24, 197--201.
|
| |
19
|
Tashiro, T., Rikitoku, M., and Nakagawa, T. 2005. Justsystem at NTCIR-5 patent classification. In Proceedings of the 5th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'05).
|
 |
20
|
Ioannis Tsochantaridis , Thomas Hofmann , Thorsten Joachims , Yasemin Altun, Support vector machine learning for interdependent and structured output spaces, Proceedings of the twenty-first international conference on Machine learning, p.104, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015341]
|
|