| Extracting structured information from user queries with semi-supervised conditional random fields |
| Full text |
Pdf
(666 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Query formulation
table of contents
Pages 572-579
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
Xiao Li
|
Microsoft Research, Redmond, WA, USA
|
|
Ye-Yi Wang
|
Microsoft Research, Redmond, WA, USA
|
|
Alex Acero
|
Microsoft Research, Redmond, WA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 53, Downloads (12 Months): 188, Citation Count: 0
|
|
|
ABSTRACT
When search is against structured documents, it is beneficial to extract information from user queries in a format that is consistent with the backend data structure. As one step toward this goal, we study the problem of query tagging which is to assign each query term to a pre-defined category. Our problem could be approached by learning a conditional random field (CRF) model (or other statistical models) in a supervised fashion, but this would require substantial human-annotation effort. In this work, we focus on a semi-supervised learning method for CRFs that utilizes two data sources: (1) a small amount of manually-labeled queries, and (2) a large amount of queries in which some word tokens have derived labels, i.e., label information automatically obtained from additional resources. We present two principled ways of encoding derived label information in a CRF model. Such information is viewed as hard evidence in one setting and as soft evidence in the other. In addition to the general methodology of how to use derived labels in semi-supervised CRFs, we also present a practical method on how to obtain them by leveraging user click data and an in-domain database that contains structured documents. Evaluation on product search queries shows the effectiveness of our approach in improving tagging accuracies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
J. Bilmes. On soft evidence in Bayesian networks. Technical Report UWEETR-2004-0016, University of Washington, 2004.
|
| |
5
|
S. Canisius and C. Sporleder. Bootstrapping information extraction from field books. In Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing, pages 827--836, 2007.
|
 |
6
|
|
| |
7
|
Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, 2004.
|
| |
8
|
|
| |
9
|
Feng Jiao , Shaojun Wang , Chi-Hoon Lee , Russell Greiner , Dale Schuurmans, Semi-supervised conditional random fields for improved sequence segmentation and labeling, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, p.209-216, July 17-18, 2006, Sydney, Australia
[doi> 10.3115/1220175.1220202]
|
| |
10
|
|
 |
11
|
|
| |
12
|
G.S. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning of conditional random fields. In Proceedings of Association of Computational Linguistics, 2008.
|
 |
13
|
|
| |
14
|
|
| |
15
|
J. Suzuki and H. Isozaki. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In Proceedings of the 46th Annual Meeting of the ACL: Human Language Technologies, pages 665--673, 2008.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
C. Zhao, J. Mahmud, and I. Ramakrishnan. Exploiting structured reference data for unsupervised text segmentation with conditional random fields. In Proceedings of the SIAM International Conference on Data Mining, 2008.
|
 |
20
|
Jun Zhu , Bo Zhang , Zaiqing Nie , Ji-Rong Wen , Hsiao-Wuen Hon, Webpage understanding: an integrated approach, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281288]
|
|