ACM Home Page
Please provide us with feedback. Feedback
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data
Full text PdfPdf (170 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 99  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Charles Sutton  University of Massachusetts, Amherst, MA
Khashayar Rohanimanesh  University of Massachusetts, Amherst, MA
Andrew McCallum  University of Massachusetts, Amherst, MA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 24,   Citation Count: 30
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015422
What is a DOI?

ABSTRACT

In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aji, S., Horn, G., & McEliece, R. (1998). The convergence of iterative decoding on graphs with a single cycle. Proc. IEEE Int'l Symposium on Information Theory.
 
2
 
3
 
4
Bui, H. H., Venkatesh, S., & West, G. (2002). Policy recognition in the Abstract Hidden Markov Model. Journal of Artificial Intelligence Research, 17.
 
5
 
6
 
7
Frietag, D., & McCallum, A. (1999). Information extraction with HMMs and shrinkage. AAAI Workshop on Machine Learning for Information Extraction.
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
Mohri, M., Pereira, F., & Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16, 69--88.
 
17
Murphy, K., & Paskin, M. A. (2001). Linear time inference in hierarchical HMMs. Proceedings of Fifteenth Annual Conference on Neural Information Processing Systems.
 
18
 
19
Murphy, K. P., Weiss, Y., & Jordan, M. I. (1999). Loopy belief propagation for approximate inference: An empirical study. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 467--475).
 
20
Nefian, A., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., & Murphy, K. (2002). A coupled HMM for audio-visual speech recognition. IEEE Int'l Conference on Acoustics, Speech and Signal Processing (pp. 2013--2016).
 
21
Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).
 
22
Peshkin, L., & Pfeffer, A. (2003). Bayesian information extraction network. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
23
 
24
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257 -- 286.
 
25
Ramshaw, L. A., & Marcus, M. P. (1995). Text chunking using transformation-based learning. Proceedings of the Third ACL Workshop on Very Large Corpora.
 
26
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proc. of the 1996 Conference on Empirical Methods in Natural Language Proceeding (EMNLP 1996).
 
27
 
28
 
29
Skounakis, M., Craven, M., & Ray, S. (2003). Hierarchical hidden Markov models for information extraction. Proceedings of the 18th International Joint Conference on Artificial Intelligence.
 
30
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02).
 
31
Theocharous, G., Rohanimanesh, K., & Mahadevan, S. (2001). Learning hierarchical partially observable Markov decision processes for robot navigation. Proceedings of the IEEE Conference on Robotics and Automation.
 
32
 
33
Wainwright, M., Jaakkola, T., & Willsky, A. (2001). Tree-based reparameterization for approximate estimation on graphs with cycles. Advances in Neural Information Processing Systems (NIPS).
 
34
Yedidia, J., Freeman, W., & Weiss, Y. (2000). Generalized belief propagation. Advances in Neural Information Processing Systems (NIPS).

CITED BY  30
Collaborative Colleagues:
Charles Sutton: colleagues
Khashayar Rohanimanesh: colleagues
Andrew McCallum: colleagues