ACM Home Page
Please provide us with feedback. Feedback
Adapting Support Vector Machines for F-term-based Classification of Patents
Full text PdfPdf (153 KB)
Source
ACM Transactions on Asian Language Information Processing (TALIP) archive
Volume 7 ,  Issue 2  (June 2008) table of contents
Article No. 7  
Year of Publication: 2008
ISSN:1530-0226
Authors
Yaoyong Li  University of Sheffield, UK
Kalina Bontcheva  University of Sheffield, UK
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 108,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1362782.1362786
What is a DOI?

ABSTRACT

Support Vector Machines (SVM) have obtained state-of-the-art results on many applications including document classification. However, previous works on applying SVMs to the F-term patent classification task did not obtain as good results as other learning algorithms such as kNN. This is due to the fact that F-term patent classification is different from conventional document classification in several aspects, mainly because it is a multiclass, multilabel classification problem with semi-structured documents and multi-faceted hierarchical categories.

This article describes our SVM-based system and several techniques we developed successfully to adapt SVM for the specific features of the F-term patent classification task. We evaluate the techniques using the NTCIR-6 F-term classification terms assigned to Japanese patents. Moreover, our system participated in the NTCIR-6 patent classification evaluation and obtained the best results according to two of the three metrics used for task performance evaluation. Following the NTCIR-6 participation, we developed two new techniques, which achieved even better scores using all three NTCIR-6 metrics, effectively outperforming all participating systems. This article presents this new work and the experimental results that demonstrate the benefits of the latest approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Cancedda, N., Cesa-Bianchi, N., Conconi, A., Gentile, C., Goutte, C., Graepel, T., Li, Y., Renders, J. M., and Shawe-Taylor, J. 2003. Kernel methods for document filtering. In eds. E. M. Voorhees and L. P. Buckland. In Proceedings of the 11th Text Retrieval Conference (TREC'02).
 
2
Cesa-Bianchi, N., Gentile, C., Tironi, A., and Zaniboni, L. 2004. Incremental algorithms for hierarchical classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS'04).
 
3
 
4
Fujino, A. and Isozaki, H. 2007. Multi-label patent classification at NTT Communication Science Laboratories. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 381--384.
5
 
6
Hashimoto, K. and Yukawa, T. 2007. Term weighting classification system using the chi-square statistic for the classification subtask at ntcir-6 patent retrieval task. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 385--389.
 
7
Iwayama, M., Fujii, A., and Kando, N. 2005. Overview of classification subtask at NTCIR-5 patent retrieval task. In Proceedings of NTCIR-5 Workshop Meeting.
 
8
Iwayama, M., Fujii, A., and Kando, N. 2007. Overview of classification subtask at NTCIR-6 patent retrieval task. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07).
 
9
 
10
 
11
Li, Y., Bontcheva, K., and Cunningham, H. 2007. Cost sensitive evaluation measures for F-term patent classification. In Proceedings of the 1st International Workshop on Evaluating Information Access (EVIA'07), 44--53.
 
12
Li, Y., Bontcheva, K., and Cunningham, H. 2007. SVM based learning system for F-term patent classification. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 396--402.
 
13
Li, Y. and Shawe-Taylor, J. 2003. The SVM with uneven margins and Chinese document categorization. In Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation (PACLIC17). Singapore, China.
 
14
Liu, C.-Y. and Luo, S.-Y. 2007. Investigation of carbon nanotubes using the F-term code of Japanese patent information. Data Science J. 6, (Supplement) S255--S260.
 
15
Makita, M., Higuchi, S., Fujii, A., and Ishikawa, T. 2003. A system for Japanese/English/Korean multilingual patent retrieval. In Proceedings of Machine Translation Summit IX. Available online http://www.amtaweb.org/summit/MTSummit/papers.html.
 
16
Murata, M., Kanamaru, T., Shirado, T., and Isahara, H. 2007. Using the k-nearest neighbor method and smart weighting in the patent document categorization subtask at ntcir-6. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 407--413.
 
17
Rikitoku, M. 2007. F-term classification experiments at NTCIR-6 for justsytems. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'07), 420--427.
 
18
Schellner, I. 2002. Japanese file index classification and f-terms. World Patent Inform. 24, 197--201.
 
19
Tashiro, T., Rikitoku, M., and Nakagawa, T. 2005. Justsystem at NTCIR-5 patent classification. In Proceedings of the 5th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (NTCIR'05).
20

Collaborative Colleagues:
Yaoyong Li: colleagues
Kalina Bontcheva: colleagues