ACM Home Page
Please provide us with feedback. Feedback
Leveraging context in user-centric entity detection systems
Full text PdfPdf (696 KB)
Source
Conference on Information and Knowledge Management archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management table of contents
Lisbon, Portugal
SESSION: Information representation and integration (KM) table of contents
Pages 691-700  
Year of Publication: 2007
ISBN:978-1-59593-803-9
Authors
Vadim von Brzeski  Yahoo!, Inc., Santa Clara, CA
Utku Irmak  Yahoo!, Inc., Santa Clara, CA
Reiner Kraft  Yahoo!, Inc., Santa Clara, CA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 61,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1321440.1321537
What is a DOI?

ABSTRACT

A user-centric entity detection system is one in which the primary consumer of the detected entities is a person who can perform actions on the detected entities (e.g. perform a search, view a map, shop, etc.). We contrast this with machine-centric detection systems where the primary consumer of the detected entities is a machine. Machine-centric detection systems typically focus on the quantity of detected entities, measured by precision and recall metrics, with the goal of correctly identifying every single entity in a document.

However, the simple precision/recall scores of machine-centric entity detection systems fail to accurately reflect the quality of detected entities in user-centric systems, where users may not necessarily want to "see" every possible entity. We posit that not all of the detected entities in a given piece of text are necessarily relevant to the main topic of the text, nor are they necessarily interesting enough to the user to warrant further action. In fact, presenting all of the detected entities to a user may annoy the user to the point where he decides to turn this capability off completely, an undesirable outcome. Therefore, we propose to measure the quality and utility of user-centric entity detection systems in three core dimensions: the accuracy, the interestingness, and the relevance of the entities it presents to the user. We show that leveraging surrounding context can greatly improve the performance of such systems in all three dimensions by employing novel algorithms for generating a concept vector and for finding concept extensions using search query logs.

We extensively evaluate the proposed algorithms within Contextual Shortcuts - a large-scale user-centric entity detection platform - using 1,586 entities detected over 1,519 documents. The results confirm the importance of using context within user-centric entity detection systems, and validate the usefulness of the proposed algorithms by showing how they improve the overall entity detection quality within Contextual Shortcuts.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Appelt, J. Hobbs, J. Bear, D. J. Israel, and M. Tyson. FASTUS: a finite-state processor for information extraction from real-world text. In Proceedings of IJCAI-93, 1993.
 
2
 
3
S. Baluja, V. Mittal, and R. Sukthankar. Applying Machine Learning for High Performance Named-Entity Extraction. Computational Intelligence, 16(4), November 2000.
 
4
 
5
 
6
A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Proceedings of the 6th Workshop on Very Large Corpora, 1998.
7
 
8
 
9
J. Goodman and V. R. Carvalho. Implicit queries for email. In Proceedings of the 2nd Conference on Email and Anti-Spam, 2005.
 
10
11
 
12
 
13
P. Jackson and I. Moulinier. Natural Language Processing for Online Applications. John Benjamins Publishing Company, 2002.
 
14
S. Kapur and D. Joshi. Systems and methods for generating concept units from search queries. United States Patent 7051023, May 2006.
15
 
16
17
18
 
19
20
 
21


Collaborative Colleagues:
Vadim von Brzeski: colleagues
Utku Irmak: colleagues
Reiner Kraft: colleagues