| Using structured text for large-scale attribute extraction |
| Full text |
Pdf
(502 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: IR: structured documents
table of contents
Pages 1183-1192
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
Sujith Ravi
|
University of Southern California, Marina del Rey, CA, USA
|
|
Marius Paşca
|
Google Inc., Mountain View, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 159, Citation Count: 0
|
|
|
ABSTRACT
We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is around 30% higher than with previous methods operating on Web documents. In addition to attribute extraction, this approach also automatically identifies values for a subset of the extracted class attributes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2670--2676, Hyderabad, India, 2007.
|
| |
3
|
Michael J. Cafarella , Doug Downey , Stephen Soderland , Oren Etzioni, KnowItNow: fast, scalable information extraction from the web, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.563-570, October 06-08, 2005, Vancouver, British Columbia, Canada
[doi> 10.3115/1220575.1220646]
|
| |
4
|
|
| |
5
|
T. Chklovski and Y. Gil. An analysis of knowledge collected from volunteer contributors. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), pages 564--571, Pittsburgh, Pennsylvania, 2005.
|
| |
6
|
A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, 29(1), 2006.
|
| |
7
|
C. Fellbaum, editor. WordNet: An Electronic Lexical Database and Some of its Applications. MIT Press, 1998.
|
| |
8
|
T. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1), 2006.
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
K. Probst, R. Ghani, M. Krema, A. Fano, and Y. Liu. Semi-supervised learning of attribute-value pairs from product descriptions. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2838--2843, Hyderabad, India, 2007.
|
| |
13
|
M. Remy. Wikipedia: The free encyclopedia. Online Information Review, 26(6):434, 2002.
|
| |
14
|
K. Shinzato and K. Torisawa. Acquiring hyponymy relations from Web documents. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 73--80, Boston, Massachusetts, 2004.
|
| |
15
|
K. Tokunaga, J. Kazama, and K. Torisawa. Automatic discovery of attribute words from Web documents. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106--118, Jeju Island, Korea, 2005.
|
 |
16
|
|
| |
17
|
N. Yoshinaga and K. Torisawa. Open-domain attribute-value acquisition from semi-structured texts. In Proceedings of the 6th International Semantic Web Conference (ISWC-07), Workshop on Text to Knowledge: The Lexicon/Ontology Interface (OntoLex-2007), pages 55--66, Busan, South Korea, 2007.
|
|