| Learning probabilistic automata with variable memory length |
| Full text |
Pdf
(1.31 MB)
|
| Source
|
Annual Workshop on Computational Learning Theory
archive
Proceedings of the seventh annual conference on Computational learning theory
table of contents
New Brunswick, New Jersey, United States
Pages: 35 - 46
Year of Publication: 1994
ISBN:0-89791-655-7
|
|
Authors
|
|
Dana Ron
|
Institute of Computer Science and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel
|
|
Yoram Singer
|
Institute of Computer Science and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel
|
|
Naftali Tishby
|
Institute of Computer Science and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 36, Citation Count: 12
|
|
|
ABSTRACT
We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Finite Suffix Automata. The learning algorithm is motivated by real applications in man-machine interaction such as hand-writing and speech recognition. Conventionally used fixed memory Markov and hidden Markov models have either severe practical or theoretical drawbacks. Though general hardness results are known for learning distributions generated by sources with similar structure, we prove that our algorithm can indeed efficiently learn distributions generated by our more restricted sources. In Particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made small with high confidence in polynomial time and sample complexity. We demonstrate the applicability of our algorithm by learning the structure of natural English text and using our hypothesis for the correction of corrupted text.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Lewis Carroll. Alice's adventures in wonderland. The Millennium Fulcrum edition 2.9.
|
| |
3
|
|
| |
4
|
J.A. Fill. Eigenvalue bounds on convergence to stationary for nonreversible Markov chains, with an application to exclusion process. Annals of Applied Probability, 1:62-87, 1991.
|
 |
5
|
Yoav Freund , Michael Kearns , Dana Ron , Ronitt Rubinfeld , Robert E. Schapire , Linda Sellie, Efficient learning of typical finite automata from random walks, Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, p.315-324, May 16-18, 1993, San Diego, California, United States
[doi> 10.1145/167088.167191]
|
 |
6
|
|
| |
7
|
F. Jelinek. Self-organized language modeling for speech recognition. Technical report, IBM T.J. Watson Research Center, 1985.
|
 |
8
|
Michael Kearns , Yishay Mansour , Dana Ron , Ronitt Rubinfeld , Robert E. Schapire , Linda Sellie, On the learnability of discrete distributions, Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, p.273-282, May 23-25, 1994, Montreal, Quebec, Canada
[doi> 10.1145/195058.195155]
|
| |
9
|
|
| |
10
|
M. Mihail. Conductance and convergence of Markov chains - A combinatorial treatment of expanders. In Proceedings 30th Annual Conference on Foundations of Computer Science, 1989.
|
| |
11
|
A. Nadas. Estimation of probabilities in the language model of the IBM speech recognition system. IEEE Trans. on ASSP, 32(4):859-861, 1984.
|
| |
12
|
L.R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989.
|
| |
13
|
L.R. Rabiner and B. H. Juang. An introduction to hidden markov models. IEEE ASSP Magazine, 3(1):4-16, January 1986.
|
| |
14
|
J. Rissanen. A universal data compression system. IEEE Transactions on Information Theory, 29(5):656-664, 1983.
|
| |
15
|
D. Ron, Y. Singer, and N. Tishby. The power of amnesia. In Advances in Neural Information Processing Systems, volume 6. Morgan Kaufmann, 1993.
|
| |
16
|
H. 5chiitze and Y. Singer. Fart-of-Speech tagging using a variable memory Markov model. In Proceedings of ACL 3~'nd, 1994.
|
| |
17
|
M.J. Weinberger, A. Lempel, and J. Ziv. A sequential algorithm for the universal coding of finitememory sources. IEEE Transactions on Information Theory, 38:1002-1014, May 1982.
|
| |
18
|
M.J. Weinberger, J. Rissanen, and M. Feder. A universal finite memory source. Submitted for publication.
|
CITED BY 12
|
|
Yoav Freund , Robert E. Schapire , Yoram Singer , Manfred K. Warmuth, Using and combining predictors that specialize, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.334-343, May 04-06, 1997, El Paso, Texas, United States
|
|
|
Dana Ron , Yoram Singer , Naftali Tishby, On the learnability and usage of acyclic probabilistic finite automata, Proceedings of the eighth annual conference on Computational learning theory, p.31-40, July 05-08, 1995, Santa Cruz, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Robert Malkin , Datong Chen , Jie Yang , Alex Waibel, Multimodal estimation of user interruptibility for smart mobile telephones, Proceedings of the 8th international conference on Multimodal interfaces, November 02-04, 2006, Banff, Alberta, Canada
|
|
|
|
|
|
|
|