|
ABSTRACT
Hidden variables, evolving over time, appear in multiple settings, where it is valuable to recover them, typically from observed sums. Our driving application is 'network tomography', where we need to estimate the origin-destination (OD) traffic flows to determine, e.g., who is communicating with whom in a local area network. This information allows network engineers and managers to solve problems in design, routing, configuration debugging, monitoring and pricing. Unfortunately the direct measurement of the OD traffic is usually difficult, or even impossible; instead, we can easily measure the loads on every link, that is, sums of desirable OD flows.In this paper we propose i-FILTER, a method to solve this problem, which improves the state-of-the-art by (a) introducing explicit time dependence, and by (b) using realistic, non-Gaussian marginals in the statistical models for the traffic flows, as never attempted before. We give experiments on real data, where i-FILTER scales linearly with new observations and out-performs the best existing solutions, in a wide variety of settings. Specifically, on real network traffic measured at CMU, and at AT&T, i-FILTER reduced the estimation errors between 15% and 46% in all cases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. M. Airoldi. Advances in network tomography. Technical Report CMU-CALD-03-101, Carnegie Mellon University, 2003.
|
| |
2
|
E. M. Airoldi and C. Faloutsos. Recovering Latent Time-Series from their Observed Sums. Technical Report CMU-CS-04-130, Carnegie Mellon University, 2004.
|
| |
3
|
Y. Bejerano, Y. Breitbart, M. Garofalakis, and R. Rastogi. Physical topology discovery for large multi-subnet networks. In Proceedings of IEEE INFOCOM, pages 342--352, 2003.
|
 |
4
|
Zhiqiang Bi , Christos Faloutsos , Flip Korn, The "DGX" distribution for mining massive, skewed data, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, p.17-26, August 26-29, 2001, San Francisco, California
[doi> 10.1145/502512.502521]
|
| |
5
|
J. Cao, D. Davis, S. Van Der Viel, and B. Yu. Time-varying network tomography: router link data. Journal of the American Statistical Association, 95:1063--75, 2000.
|
| |
6
|
J. Cao, D. Davis, S. Van Der Viel, B. Yu, and Z. Zu. A scalable method for estimating network traffic matrices from link counts. Technical report, Bell Labs, 2001.
|
| |
7
|
S. E. Fienberg. The geometry of an r x c contingency table. Annals of Mathematical Statistics, 39:1186--90, 1968.
|
| |
8
|
S. E. Fienberg. The geometry of a two by two contingency table. Journal of the American Statistical Association, 65:694--701, 1970a.
|
| |
9
|
W. R. Gilks and C. Berzuini. Following a moving target | monte carlo inference for dynamic bayesian models. Journal of the Royal Statistical Society, Series B, 63:127--146, 2001.
|
| |
10
|
T. Higuchi. Self-organizing time series model. In A. Doucet, N. de Freitas, and N. Gordon, editors, Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2001.
|
| |
11
|
G. Liang and B. Yu. Pseudo-likelihood estimations in network tomography. In Proceedings of IEEE INFOCOM, 2003.
|
| |
12
|
J. Liu. Monte Carlo Strategies in Scientific Computing. Springer-Verlag, 2001.
|
| |
13
|
B. Mandelbrot. An informational theory of the statistical structure of language. In W. Jackson, editor, Communication Theory. Butterworths, 1953.
|
 |
14
|
A. Medina , N. Taft , K. Salamatian , S. Bhattacharyya , C. Diot, Traffic matrix estimation: existing techniques and new directions, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
| |
15
|
Irina Rish , Mark Brodie , Sheng Ma, Accuracy vs. efficiency trade-offs in probabilistic diagnosis, Eighteenth national conference on Artificial intelligence, p.560-566, July 28-August 01, 2002, Edmonton, Alberta, Canada
|
| |
16
|
I. Rish, M. Brodie, N. Odintsova, S. Ma, and G. Grabarnik. Real-time problem determination in distributed systems using active probing. To appear in NOMS, 2004.
|
| |
17
|
C. and G. Casella. Monte Carlo Statistical Methods. Springer-Verlag, 1999.
|
| |
18
|
C. Tebaldi and M. West. Bayesian inference on network traffic using link count data. Journal of the American Statistical Association, 93:557--576, 1998.
|
| |
19
|
R. J. Vanderbei and J. Iannone. An em approach to od matrix estimation. Technical Report SOR 94-04, Princeton University, 1994.
|
| |
20
|
Y. Vardi. Network tomography: estimating source-destination traffic intensities from link data. Journal of the American Statistical Association, 91:365--377, 1996.
|
 |
21
|
|
| |
22
|
|
 |
23
|
Yin Zhang , Matthew Roughan , Nick Duffield , Albert Greenberg, Fast accurate computation of large-scale IP traffic matrices from link loads, Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 11-14, 2003, San Diego, CA, USA
|
 |
24
|
Yin Zhang , Matthew Roughan , Carsten Lund , David Donoho, An information-theoretic approach to traffic matrix estimation, Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, August 25-29, 2003, Karlsruhe, Germany
[doi> 10.1145/863955.863990]
|
| |
25
|
G. K. Zipf. Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press, 1932.
|
|