|
ABSTRACT
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Aji, S., Horn, G., & McEliece, R. (1998). The convergence of iterative decoding on graphs with a single cycle. Proc. IEEE Int'l Symposium on Information Theory.
|
| |
2
|
|
| |
3
|
|
| |
4
|
Bui, H. H., Venkatesh, S., & West, G. (2002). Policy recognition in the Abstract Hidden Markov Model. Journal of Artificial Intelligence Research, 17.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Frietag, D., & McCallum, A. (1999). Information extraction with HMMs and shrinkage. AAAI Workshop on Machine Learning for Information Extraction.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Mohri, M., Pereira, F., & Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16, 69--88.
|
| |
17
|
Murphy, K., & Paskin, M. A. (2001). Linear time inference in hierarchical HMMs. Proceedings of Fifteenth Annual Conference on Neural Information Processing Systems.
|
| |
18
|
|
| |
19
|
Murphy, K. P., Weiss, Y., & Jordan, M. I. (1999). Loopy belief propagation for approximate inference: An empirical study. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 467--475).
|
| |
20
|
Nefian, A., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., & Murphy, K. (2002). A coupled HMM for audio-visual speech recognition. IEEE Int'l Conference on Acoustics, Speech and Signal Processing (pp. 2013--2016).
|
| |
21
|
Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).
|
| |
22
|
Peshkin, L., & Pfeffer, A. (2003). Bayesian information extraction network. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
|
 |
23
|
|
| |
24
|
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257 -- 286.
|
| |
25
|
Ramshaw, L. A., & Marcus, M. P. (1995). Text chunking using transformation-based learning. Proceedings of the Third ACL Workshop on Very Large Corpora.
|
| |
26
|
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proc. of the 1996 Conference on Empirical Methods in Natural Language Proceeding (EMNLP 1996).
|
| |
27
|
|
| |
28
|
|
| |
29
|
Skounakis, M., Craven, M., & Ray, S. (2003). Hierarchical hidden Markov models for information extraction. Proceedings of the 18th International Joint Conference on Artificial Intelligence.
|
| |
30
|
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02).
|
| |
31
|
Theocharous, G., Rohanimanesh, K., & Mahadevan, S. (2001). Learning hierarchical partially observable Markov decision processes for robot navigation. Proceedings of the IEEE Conference on Robotics and Automation.
|
| |
32
|
|
| |
33
|
Wainwright, M., Jaakkola, T., & Willsky, A. (2001). Tree-based reparameterization for approximate estimation on graphs with cycles. Advances in Neural Information Processing Systems (NIPS).
|
| |
34
|
Yedidia, J., Freeman, W., & Weiss, Y. (2000). Generalized belief propagation. Advances in Neural Information Processing Systems (NIPS).
|
CITED BY 30
|
|
|
|
|
Ben Wellner , Andrew McCallum , Fuchun Peng , Michael Hay, An integrated, conditional model of information extraction and coreference with application to citation matching, Proceedings of the 20th conference on Uncertainty in artificial intelligence, p.593-601, July 07-11, 2004, Banff, Canada
|
|
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, 2D Conditional Random Fields for Web information extraction, Proceedings of the 22nd international conference on Machine learning, p.1044-1051, August 07-11, 2005, Bonn, Germany
|
|
|
|
|
|
|
|
|
|
|
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, Simultaneous record detection and attribute labeling in web data extraction, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Radu Florian , Hongyan Jing , Nanda Kambhatla , Imed Zitouni, Factorizing complex models: a case study in mention detection, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, p.473-480, July 17-18, 2006, Sydney, Australia
|
|
|
|
|
|
Jun Zhu , Zaiqing Nie , Bo Zhang , Ji-Rong Wen, Dynamic hierarchical Markov random fields and their application to web data extraction, Proceedings of the 24th international conference on Machine learning, p.1175-1182, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
William Pentney , Matthai Philipose , Jeff Bilmes , Henry Kautz, Learning large scale common sense models of everyday life, Proceedings of the 22nd national conference on Artificial intelligence, p.465-470, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
Huanhuan Cao , Derek Hao Hu , Dou Shen , Daxin Jiang , Jian-Tao Sun , Enhong Chen , Qiang Yang, Context-aware query classification, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|