|
ABSTRACT
In this paper we survey recent work on incremental data mining model maintenance and change detection under block evolution. In block evolution, a dataset is updated periodically through insertions and deletions of blocks of records at a time. We describe two techniques: (1) We describe a generic algorithm for model maintenance that takes any traditional incremental data mining model maintenance algorithm and transforms it into an algorithm that allows restrictions on a temporal subset of the database. (2) We also describe a generic framework for change detection, that quantifies the difference between two datasets in terms of the data mining models they induce.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, The Aqua approximate query answering system, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.574-576, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
4
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
5
|
Rakesh Agrawal , Hiekki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
6
|
R. Agrawal and G. Psaila. Acive data mining. Proceedings of the first international conference on knowledge discovery and data mining, 1995.
|
| |
7
|
R. Agrawal and A. Swami. A one-pass space-efficient algorithm for finding quantiles. In S. Chaudhuri, A. Deshpande, and R. Krishnamurthy, editors, Proceedings of the 7th International Conference on Management of Data (COMAD), December 1995.
|
| |
8
|
|
| |
9
|
|
 |
10
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303978]
|
| |
11
|
|
| |
12
|
|
| |
13
|
T. W. Anderson. The statistical analysis of time series. John Wiley & Sons, Inc., 1971.
|
| |
14
|
|
 |
15
|
|
| |
16
|
D. Barbará, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The new jersey data reduction report. Data Engineering Bulletin, 20(4):3-45, 1997.
|
| |
17
|
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
 |
23
|
Zhiyuan Chen , Johannes Gehrke , Flip Korn, Query optimization in compressed database systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.271-282, May 21-24, 2001, Santa Barbara, California, United States
|
| |
24
|
|
| |
25
|
|
| |
26
|
D. Cheung, T. Vincent, and W. Benjamin. Maintenance of discovered knowledge: A case in multi-level association rules. In Proceedings of the second international conference on knowledge discovery in databases, August 1996.
|
 |
27
|
Corinna Cortes , Kathleen Fisher , Daryl Pregibon , Anne Rogers, Hancock: a language for extracting signatures from data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.9-17, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347094]
|
| |
28
|
|
| |
29
|
A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors. SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania, USA. ACM Press, 1999.
|
 |
30
|
|
| |
31
|
|
| |
32
|
M. Ester, H.-P. Kriegel, and X. Xu. A database interface for clustering in large spatial databases. In Proc. of the 1st Int'l Conference on Knowledge Discovery in Databases and Data Mining, Montreal, Canada, August 1995.
|
| |
33
|
|
| |
34
|
J. Feigenbaum , S. Kannan , M. Strauss , M. Viswanathan, Testing and spot-checking of data streams (extended abstract), Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, p.165-174, January 09-11, 2000, San Francisco, California, United States
|
| |
35
|
|
| |
36
|
|
 |
37
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, A framework for measuring changes in data characteristics, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.126-137, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303989]
|
 |
38
|
Johannes Gehrke , Venkatesh Ganti , Raghu Ramakrishnan , Wei-Yin Loh, BOAT—optimistic decision tree construction, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.169-180, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
39
|
|
 |
40
|
|
| |
41
|
|
 |
42
|
|
| |
43
|
|
 |
44
|
|
 |
45
|
|
 |
46
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
| |
47
|
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998-011, Digital Eqipment Corporation, Systems Research Center, May, 1998.
|
 |
48
|
|
| |
49
|
|
| |
50
|
M. Klenner and U. Hahn. Concept versioning: A methodology for tracking evolutionary concept drift in dynamic concept systems. In A. G. Cohn, editor, Proceedings of the Eleventh European Conference on Artificial Intelligence, pages 473-477, Chichester, Aug. 8-12 1994. John Wiley and Sons.
|
 |
51
|
Ling Liu , Calton Pu , Wei Tang , David Buttler , John Biggs , Tong Zhou , Paul Benninghoff , Wei Han , Fenghua Yu, CQ: a personalized update monitoring toolkit, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.547-549, June 01-04, 1998, Seattle, Washington, United States
|
 |
52
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
 |
53
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
54
|
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. High-performance clustering of streams and large data sets. In Proceedings of the 18th International Conference on Data Engineering, 2002.
|
| |
55
|
|
| |
56
|
|
| |
57
|
|
 |
58
|
Douglas Terry , David Goldberg , David Nichols , Brian Oki, Continuous queries over append-only databases, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.321-330, June 02-05, 1992, San Diego, California, United States
|
| |
59
|
P. Utgoff. ID5: An incremental ID3. In Proceedings of the Fifth International Conference on Machine Learning, pages 107-120. Morgan Kaufmann, 1988.
|
| |
60
|
|
| |
61
|
|
| |
62
|
|
CITED BY 17
|
|
|
|
|
|
|
|
Mohamed Medhat Gaber , Shonali Krishnaswamy , Arkady Zaslavsky, Cost-efficient mining techniques for data streams, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.109-114, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|