| Banded structure in binary matrices |
| Full text |
Pdf
(531 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 292-300
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Gemma C. Garriga
|
Helsinki University of Technology, Helsinki, Finland
|
|
Esa Junttila
|
University of Helsinki, Helsinki, Finland
|
|
Heikki Mannila
|
University of Helsinki and Helsinki University of Technology, Helsinki, Finland
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 227, Citation Count: 0
|
|
|
ABSTRACT
A 0--1 matrix has a banded structure if both rows and columns can be permuted so that the non-zero entries exhibit a staircase pattern of overlapping rows. The concept of banded matrices has its origins in numerical analysis, where entries can be viewed as descriptions between the problem variables; the bandedness corresponds to variables that are coupled over short distances. Banded data occurs also in other applications, for example in the physical mapping problem of the human genome, in paleontological data, in network data and in the discovery of overlapping communities without cycles. We study in this paper the banded structure of binary matrices, give a formal definition of the concept and discuss its theoretical properties. We consider the algorithmic problems of computing how far a matrix is from being banded, and of finding a good submatrix of the original data that exhibits approximate bandedness. Finally, we show by experiments on real data from ecology and other applications the usefulness of the concept. Our results reveal that bands exist in real datasets and that the final obtained ordering of rows and columns have natural interpretations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
F. Alizadeh, R. M. Karp, L. A. Newberg, and D. K. Weisser. Physical mapping of chromosomes: A combinatorial problem in molecular biology. Algorithmica, 13(1/2):52--76, 1995.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
Arindam Banerjee , Chase Krumpelman , Joydeep Ghosh , Sugato Basu , Raymond J. Mooney, Model-based overlapping clustering, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081932]
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
M. Fortelius. Neogene of the old world database of fossil mammals (NOW). http://www.helsinki.fi/science/now/, 2008.
|
| |
12
|
K. K. Puolam¨aki, M. Fortelius, and H. Mannila. Seriation in paleontological data using Markov chain Monte Carlo methods. PLoS Comput Biol, 2, 2006.
|
| |
13
|
I.-J. Lin and D. B. West. Interval digraphs that are indifference digraphs. In Graph theory, Combinatorics, and Algorithms, pages 751--765, 1992.
|
 |
14
|
|
| |
15
|
|
| |
16
|
S. Myllykangas, J. Himberg, T. Böhling, B. Nagy, J. Hollm´en, and S. Knuutila. Dna copy number amplification profiling of human neoplasms. Oncogene, 25:7324--7332, 2006.
|
| |
17
|
C. H. Papadimitriou. The NP-completeness of the bandwidth minimization problem. Computing, 16:263--270, 1976.
|
| |
18
|
F. S. Roberts. Indifference graphs. In Proof Techniques in Graph Theory, pages 139--146, 1969.
|
 |
19
|
|
| |
20
|
|
| |
21
|
A. Tucker. A structure theorem for the consecutive 1's property. Journal of Combinatorial Theory, Series B, 12(2):153--162, 1972.
|
|