| Named entity mining from click-through data using weakly supervised latent dirichlet allocation |
| Full text |
Mov
(13:39),
Pdf
(729 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Industrial track papers
table of contents
Pages 1365-1374
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Gu Xu
|
Microsoft Research Asia, Beijing, China
|
|
Shuang-Hong Yang
|
Georgia Institute of Technology, Atlanta, GA, USA
|
|
Hang Li
|
Microsoft Research Asia, Beijing, China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 65, Downloads (12 Months): 158, Citation Count: 0
|
|
|
ABSTRACT
This paper addresses Named Entity Mining (NEM), in which we mine knowledge about named entities such as movies, games, and books from a huge amount of data. NEM is potentially useful in many applications including web search, online advertisement, and recommender system. There are three challenges for the task: finding suitable data source, coping with the ambiguities of named entity classes, and incorporating necessary human supervision into the mining process. This paper proposes conducting NEM by using click-through data collected at a web search engine, employing a topic model that generates the click-through data, and learning the topic model by weak supervision from humans. Specifically, it characterizes each named entity by its associated queries and URLs in the click-through data. It uses the topic model to resolve ambiguities of named entity classes by representing the classes as topics. It employs a method, referred to as Weakly Supervised Latent Dirichlet Allocation (WS-LDA), to accurately learn the topic model with partially labeled named entities. Experiments on a large scale click-through data containing over 1.5 billion query-URL pairs show that the proposed approach can conduct very accurate NEM and significantly outperforms the baseline.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Daniel M. Bikel , Scott Miller , Richard Schwartz , Ralph Weischedel, Nymble: a high-performance learning name-finder, Proceedings of the fifth conference on Applied natural language processing, p.194-201, March 31-April 03, 1997, Washington, DC
[doi> 10.3115/974557.974586]
|
| |
3
|
|
| |
4
|
D. Blei and J. McAulie. Supervised topic models. In Advances in Neural Information Processing Systems 21(NIPS'07), MIT Press, 2007.
|
| |
5
|
|
 |
6
|
Huanhuan Cao , Daxin Jiang , Jian Pei , Qi He , Zhen Liao , Enhong Chen , Hang Li, Context-aware query suggestion by mining click-through and session data, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
[doi> 10.1145/1401890.1401995]
|
 |
7
|
Hang Cui , Ji-Rong Wen , Jian-Yun Nie , Wei-Ying Ma, Probabilistic query expansion using query logs, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511489]
|
 |
8
|
|
| |
9
|
Oren Etzioni , Michael Cafarella , Doug Downey , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Unsupervised named-entity extraction from the web: an experimental study, Artificial Intelligence, v.165 n.1, p.91-134, June 2005
[doi> 10.1016/j.artint.2005.03.001]
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
N. Ueda and K. Saito. Parametric mixture models for multi-labeled text. In Advances in Neural Information Processing Systems 15 (NIPS'03), pp.721--728. MIT Press, 2003.
|
 |
22
|
|
 |
23
|
|
|