ACM Home Page
Please provide us with feedback. Feedback
Named entity mining from click-through data using weakly supervised latent dirichlet allocation
Full text MovMov (13:39),  PdfPdf (729 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Industrial track papers table of contents
Pages 1365-1374  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Gu Xu  Microsoft Research Asia, Beijing, China
Shuang-Hong Yang  Georgia Institute of Technology, Atlanta, GA, USA
Hang Li  Microsoft Research Asia, Beijing, China
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 65,   Downloads (12 Months): 158,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557165
What is a DOI?

ABSTRACT

This paper addresses Named Entity Mining (NEM), in which we mine knowledge about named entities such as movies, games, and books from a huge amount of data. NEM is potentially useful in many applications including web search, online advertisement, and recommender system. There are three challenges for the task: finding suitable data source, coping with the ambiguities of named entity classes, and incorporating necessary human supervision into the mining process. This paper proposes conducting NEM by using click-through data collected at a web search engine, employing a topic model that generates the click-through data, and learning the topic model by weak supervision from humans. Specifically, it characterizes each named entity by its associated queries and URLs in the click-through data. It uses the topic model to resolve ambiguities of named entity classes by representing the classes as topics. It employs a method, referred to as Weakly Supervised Latent Dirichlet Allocation (WS-LDA), to accurately learn the topic model with partially labeled named entities. Experiments on a large scale click-through data containing over 1.5 billion query-URL pairs show that the proposed approach can conduct very accurate NEM and significantly outperforms the baseline.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
D. Blei and J. McAulie. Supervised topic models. In Advances in Neural Information Processing Systems 21(NIPS'07), MIT Press, 2007.
 
5
6
7
8
 
9
 
10
11
 
12
13
14
 
15
 
16
17
18
19
20
 
21
N. Ueda and K. Saito. Parametric mixture models for multi-labeled text. In Advances in Neural Information Processing Systems 15 (NIPS'03), pp.721--728. MIT Press, 2003.
22
23

Collaborative Colleagues:
Gu Xu: colleagues
Shuang-Hong Yang: colleagues
Hang Li: colleagues