ACM Home Page
Please provide us with feedback. Feedback
Large-scale, parallel automatic patent annotation
Full text PdfPdf (302 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 1st ACM workshop on Patent information retrieval table of contents
Napa Valley, California, USA
SESSION: Information extraction table of contents
Pages 1-8  
Year of Publication: 2008
ISBN:978-1-60558-256-6
Authors
Milan Agatonovic  University of Sheffield, Sheffield, United Kngdm
Niraj Aswani  University of Sheffield, Sheffield, United Kngdm
Kalina Bontcheva  University of Sheffield, Sheffield, United Kngdm
Hamish Cunningham  University of Sheffield, Sheffield, United Kngdm
Thomas Heitz  University of Sheffield, Sheffield, United Kngdm
Yaoyong Li  University of Sheffield, Sheffield, United Kngdm
Ian Roberts  University of Sheffield, Sheffield, United Kngdm
Valentin Tablan  University of Sheffield, Sheffield, United Kngdm
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 77,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458572.1458574
What is a DOI?

ABSTRACT

When researching new product ideas or filing new patents, inventors need to retrieve all relevant pre-existing know-how and/or to exploit and enforce patents in their technological domain. However, this process is hindered by lack of richer metadata, which if present, would allow more powerful concept-based search to complement the current keyword-based approach. This paper presents our approach to automatic patent enrichment, tested in large-scale, parallel experiments on USPTO and EPO documents. It starts by defining the metadata annotation task and examines its challenges. The text analysis tools are presented next, including details on automatic annotation of sections, references and measurements. The key challenges encountered were dealing with ambiguities and errors in the data; creation and maintenance of large, domain-independent dictionaries; and building an efficient, robust patent analysis pipeline, capable of dealing with terabytes of data. The accuracy of automatically created metadata is evaluated against a human-annotated gold standard, with results of over 90% on most annotation types.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. Aswani, V. Tablan, K. Bontcheva, and H. Cunningham. Indexing and Querying Linguistic Metadata and Document Content. In Proceedings of Fifth International Conference on Recent Advances in Natural Language Processing (RANLP2005), Borovets, Bulgaria, 2005.
 
2
 
3
 
4
H. Cunningham. Information Extraction, Automatic. Encyclopedia of Language and Linguistics, 2nd Edition, pages 665--677, 2005.
 
5
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), 2002.
 
6
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and C. Ursu. The GATE User Guide. http://gate.ac.uk/, 2002.
 
7
D. Day, P. Robinson, M. Vilain, and A. Yeh. MITRE: Description of the Alembic System Used for MUC-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7), 1998.
 
8
M. Dean, G. Schreiber, S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. Stein. OWL web ontology language reference. W3C recommendation, W3C, Feb 2004. http://www.w3.org/TR/owl-ref/.
 
9
D. Hull, S. Ait-Mokhatar, M. Chuat, A. Eisele, E. Gaussier, G. Grefenstette, P. Isabelle, C. Samuelsson, and F. Segond. Language technologies and patent search and classification. World Patent Information, 23:265--268, 2001.
 
10
A. Kiryakov. OWLIM: balancing between scalable repository and light-weight reasoner. In Proc. of WWW2006, Edinburgh, Scotland, 2006.
 
11
Y. Li, K. Bontcheva, and H. Cunningham. SVM Based Learning System For Information Extraction. In M. N. J. Winkler and N. Lawerence, editors, Deterministic and Statistical Methods in Machine Learning, LNAI 3635, pages 319--339. Springer Verlag, 2005.
 
12
D. Maynard, K. Bontcheva, and H. Cunningham. Towards a semantic extraction of Named Entities. In Recent Advances in Natural Language Processing, Bulgaria, 2003.
 
13
D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natural Language Processing 2001 Conference, pages 257--274, Tzigov Chark, Bulgaria, 2001.
 
14
 
15
 
16
L. Wanner, R. Baeza-Yates, S. Brugmann, J. Codina, B. Diallo, E. Escorsa, M. Giereth, Y. Kompatsiaris, S. Papadopoulos, E. Pianta, G. Piella, I. Puhlmann, G. Rao, M. Rotard, P. Schoester, L. Serafini, and V. Zervaki. Towards Content-oriented Patent Document Processing. World Patent Information, 30(1):21--33, 2008.

Collaborative Colleagues:
Milan Agatonovic: colleagues
Niraj Aswani: colleagues
Kalina Bontcheva: colleagues
Hamish Cunningham: colleagues
Thomas Heitz: colleagues
Yaoyong Li: colleagues
Ian Roberts: colleagues
Valentin Tablan: colleagues