| Collective annotation of Wikipedia entities in web text |
| Full text |
Mov
(23:17),
Pdf
(622 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 457-466
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 50, Downloads (12 Months): 196, Citation Count: 0
|
|
|
ABSTRACT
To take the first step beyond keyword-based search toward entity-based search, suitable token spans ("spots") on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text around the spot and textual metadata associated with the entity. Two recent systems exploit inter-label dependencies, but in limited ways. We propose a general collective disambiguation approach. Our premise is that coherent documents refer to entities from one or a few related topics or domains. We give formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities. Optimizing the overall entity assignment is NP-hard. We investigate practical solutions based on local hill-climbing, rounding integer linear programs, and pre-clustering entities followed by local optimization within clusters. In experiments involving over a hundred manually-annotated Web pages and tens of thousands of spots, our approaches significantly outperform recently-proposed algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Sanjay Agrawal , Kaushik Chakrabarti , Surajit Chaudhuri , Venkatesh Ganti , Arnd Christian Konig , Dong Xin, Exploiting web search engines to search structured databases, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
[doi> 10.1145/1526709.1526777]
|
| |
3
|
R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL, pages 9--16, 2006.
|
| |
4
|
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In EMNLP Conference, pages 708--716, 2007.
|
 |
5
|
Stephen Dill , Nadav Eiron , David Gibson , Daniel Gruhl , R. Guha , Anant Jhingran , Tapas Kanungo , Sridhar Rajagopalan , Andrew Tomkins , John A. Tomlin , Jason Y. Zien, SemTag and seeker: bootstrapping the semantic web via automated semantic annotation, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
[doi> 10.1145/775152.775178]
|
| |
6
|
U. Feige, D. Peleg, and G. Kortsaz. The dense k-subgraph problem. Algorithmica, 29(3):410--421, Dec. 2001.
|
| |
7
|
|
| |
8
|
R. V. Guha and R. McCool. TAP: A semantic web test-bed. Journal of Web Semantics, 1(1):81--87, 2003.
|
| |
9
|
|
| |
10
|
|
| |
11
|
R. Larson. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Annual Meeting of the American Society for Information Science, 1996. Online at http://sherlock.berkeley.edu/asis96/asis96.html.
|
 |
12
|
|
 |
13
|
|
| |
14
|
G. Miller, R. Beckwith, C. FellBaum, D. Gross, K. Miller, and R. Tengi. Five papers on WordNet. Princeton University, Aug. 1993.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
|