| Fragments of order |
| Full text |
Pdf
(136 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Washington, D.C.
SESSION: Research track
table of contents
Pages: 129 - 136
Year of Publication: 2003
ISBN:1-58113-737-0
|
|
Authors
|
|
Aristides Gionis
|
Stanford University, Stanford, CA
|
|
Teija Kujala
|
University of Helsinki, P.O. Box 26, Teollisuuskatu 23, Helsinki, Finland
|
|
Heikki Mannila
|
University of Helsinki, P.O. Box 26, Teollisuuskatu 23, Helsinki, Finland
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 43, Citation Count: 7
|
|
|
ABSTRACT
High-dimensional collections of 0--1 data occur in many applications. The attributes in such data sets are typically considered to be unordered. However, in many cases there is a natural total or partial order ≺ underlying the variables of the data set. Examples of variables for which such orders exist include terms in documents, courses in enrollment data, and paleontological sites in fossil data collections. The observations in such applications are flat, unordered sets; however, the data sets respect the underlying ordering of the variables. By this we mean that if A ≺ B ≺ C are three variables respecting the underlying ordering ≺, and both of variables A and C appear in an observation, then, up to noise levels, variable B also appears in this observation. Similarly, if A1 ≺ A2 ≺ … ≺ Al-1 ≺ Ai is a longer sequence of variables, we do not expect to see many observations for which there are indices i < j < k such that Ai and Ak occur in the observation but Aj does not.In this paper we study the problem of discovering fragments of orders of variables implicit in collections of unordered observations. We define measures that capture how well a given order agrees with the observed data. We describe a simple and efficient algorithm for finding all the fragments that satisfy certain conditions. We also discuss the sometimes necessary postprocessing for selecting only the best fragments of order. Also, we relate our method with a sequencing approach that uses a spectral algorithm, and with the consecutive ones problem. We present experimental results on some real data sets (author lists of database papers, exam results data, and paleontological data).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
K. S. Booth and G. S. Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using P-Q tree algorithms. J. of Comp. and Syst. Sci., 13:335--379, 1976.
|
| |
6
|
T. F. Chan and D. C. Resasco. A framework for the analysis and construction of domain decomposition preconditioners. Technical Report CAM-87-09, UCLA, 1987.
|
| |
7
|
F. R. K. Chung. Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, 1997.
|
| |
8
|
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, 1998.
|
| |
9
|
M. Fortelius. Private communication. 2003.
|
| |
10
|
|
| |
11
|
|
| |
12
|
J. Jernvall and M. Fortelius. Common mammals drive the evolutionary increase of hypsodonty in the neogene. Nature, 417:538--540, 2002.
|
| |
13
|
Y. Koren and D. Harel. Multi-scale algorithm for the linear arrangement problem. Technical Report MCS02-04, Faculty of Mathematics and Computer Science, The Weizmann Institute of Science, 2002.
|
 |
14
|
|
| |
15
|
|
| |
16
|
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In In Advances in Neural Information Processing Systems, 2001.
|
| |
17
|
|
| |
18
|
A. Pothen, H. Simon, and L. Wang. Spectral nested dissection. Technical Report CS-92-01, Pennsylvania State University, Department of Computer Science, 1992.
|
| |
19
|
|
| |
20
|
H. D. Simon. Partitioning of unstructured mesh problems for parallel processing. Computing Systems in Engineering, 2, 1991.
|
CITED BY 7
|
|
|
|
|
|
|
|
Aristides Gionis , Heikki Mannila , Kai Puolamäki , Antti Ukkonen, Algorithms for discovering bucket orders from data, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
David Johnson , Shankar Krishnan , Jatin Chhugani , Subodh Kumar , Suresh Venkatasubramanian, Compressing large boolean matrices using reordering techniques, Proceedings of the Thirtieth international conference on Very large data bases, p.13-23, August 31-September 03, 2004, Toronto, Canada
|
|
|
Hannes Heikinheimo , Eino Hinkkanen , Heikki Mannila , Taneli Mielikäinen , Jouni K. Seppänen, Finding low-entropy sets and trees from binary data, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Jian Pei , Haixun Wang , Jian Liu , Ke Wang , Jianyong Wang , Philip S. Yu, Discovering Frequent Closed Partial Orders from Strings, IEEE Transactions on Knowledge and Data Engineering, v.18 n.11, p.1467-1481, November 2006
|
|