|
ABSTRACT
A new "herding" algorithm is proposed which directly converts observed moments into a sequence of pseudo-samples. The pseudo-samples respect the moment constraints and may be used to estimate (unobserved) quantities of interest. The procedure allows us to sidestep the usual approach of first learning a joint model (which is intractable) and then sampling from that model (which can easily get stuck in a local mode). Moreover, the algorithm is fully deterministic, avoiding random number generation) and does not need expensive operations such as exponentiation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Besag, J. (1977). Efficiency of pseudo-likelihood estimation for simple Gaussian fields. Biometrika, 64, 616--618.
|
| |
2
|
Ganapathi, V., Vickrey, D., Duchi, J., & Koller, D. (2008). Constrained approximate maximum entropy learning. Proceedings of the Twenty-fourth Conference on Uncertainty in AI (pp. 196--203).
|
| |
3
|
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721--741.
|
| |
4
|
|
| |
5
|
Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79, 2554--2558.
|
| |
6
|
|
| |
7
|
Jaynes, E. (1957). Information theory and statistical mechanics. Physical Review, 106, 620--630.
|
 |
8
|
John Lafferty, Additive models, boosting, and inference for generalized divergences, Proceedings of the twelfth annual conference on Computational learning theory, p.125-133, July 07-09, 1999, Santa Cruz, California, United States
[doi> 10.1145/307400.307422]
|
| |
9
|
Lebanon, G., & Lafferty, J. (2002). Boosting and maximum likelihood for exponential models. Neural Information Processing Systems (pp. 447--454).
|
| |
10
|
Levina, A., Herrmann, J., & Geisel, T. (2007). Dynamical synapses causing self-organized criticality in neural networks. Nature Physics, 3, 857--860.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
Parise, S., & Welling, M. (2005). Learning in markov random fields: An empirical study. Proc. of the Joint Statistical Meeting.
|
| |
16
|
Teh, Y., & Welling, M. (2002). The unified propagation and scaling algorithm. Neural Information Processing Systems (pp. 953--960).
|
 |
17
|
|
| |
18
|
Welling, M., & Parise, S. (2006). Bayesian random fields: The Bethe-Laplace approximation. Proc. of the Conf. on Uncertainty in Artificial Intelligence (pp. 512--519).
|
| |
19
|
Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics An International Journal of Probability and Stochastic Processes, 65, 177--228.
|
| |
20
|
Yuille, A. (2004). The convergence of contrastive divergences. Advances in Neural Information Processing Systems (pp. 1593--1600).
|
| |
21
|
|
| |
22
|
|
|