ACM Home Page
Please provide us with feedback. Feedback
Using structured text for large-scale attribute extraction
Full text PdfPdf (502 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR: structured documents table of contents
Pages 1183-1192  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Sujith Ravi  University of Southern California, Marina del Rey, CA, USA
Marius Paşca  Google Inc., Mountain View, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 159,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458238
What is a DOI?

ABSTRACT

We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is around 30% higher than with previous methods operating on Web documents. In addition to attribute extraction, this approach also automatically identifies values for a subset of the extracted class attributes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2670--2676, Hyderabad, India, 2007.
 
3
 
4
 
5
T. Chklovski and Y. Gil. An analysis of knowledge collected from volunteer contributors. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), pages 564--571, Pittsburgh, Pennsylvania, 2005.
 
6
A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, 29(1), 2006.
 
7
C. Fellbaum, editor. WordNet: An Electronic Lexical Database and Some of its Applications. MIT Press, 1998.
 
8
T. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1), 2006.
9
10
 
11
 
12
K. Probst, R. Ghani, M. Krema, A. Fano, and Y. Liu. Semi-supervised learning of attribute-value pairs from product descriptions. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2838--2843, Hyderabad, India, 2007.
 
13
M. Remy. Wikipedia: The free encyclopedia. Online Information Review, 26(6):434, 2002.
 
14
K. Shinzato and K. Torisawa. Acquiring hyponymy relations from Web documents. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 73--80, Boston, Massachusetts, 2004.
 
15
K. Tokunaga, J. Kazama, and K. Torisawa. Automatic discovery of attribute words from Web documents. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106--118, Jeju Island, Korea, 2005.
16
 
17
N. Yoshinaga and K. Torisawa. Open-domain attribute-value acquisition from semi-structured texts. In Proceedings of the 6th International Semantic Web Conference (ISWC-07), Workshop on Text to Knowledge: The Lexicon/Ontology Interface (OntoLex-2007), pages 55--66, Busan, South Korea, 2007.


Collaborative Colleagues:
Sujith Ravi: colleagues
Marius Paşca: colleagues