|
ABSTRACT
Partitions of sequential data exist either per se or as a result of sequence segmentation algorithms. It is often the case that the same timeline is partitioned in many different ways. For example, different segmentation algorithms produce different partitions of the same underlying data points. In such cases, we are interested in producing an aggregate partition, i.e., a segmentation that agrees as much as possible with the input segmentations. Each partition is defined as a set of continuous non-overlapping segments of the timeline. We show that this problem can be solved optimally in polynomial time using dynamic programming. We also propose faster greedy heuristics that work well in practice. We experiment with our algorithms and we demonstrate their utility in clustering the behavior of mobile-phone users and combining the results of different segmentation algorithms on genomic sequences.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
ANDERSON, E. C., AND NOVEMBRE, J. Finding haplotype block boundaries by using the minimum-description-length principle. American Journal of Human Genetics 73 (2003).
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
DALY, M. J., RIOUX, J. D., SCHAFFNER, S. F., HUDSON,T. J., AND LANDER, E. S. High-resolution haplotype structure in the human genome. Nature Genetics 29 (2001), 229--232.
|
 |
8
|
Cynthia Dwork , Ravi Kumar , Moni Naor , D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th international conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372165]
|
| |
9
|
EAGLE, N. Machine perception and learning of complex social systems. PhD Thesis, Program in Media Arts and Sciences, Massachusetts Institute of Technology (2005).
|
 |
10
|
Ronald Fagin , Ravi Kumar , Mohammad Mahdian , D. Sivakumar , Erik Vee, Comparing and aggregating rankings with ties, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, France
[doi> 10.1145/1055558.1055568]
|
| |
11
|
FAN, W., AND STOLFO, S. J. Ensemble-based adaptive intrusion detection. In SDM (2002).
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
JOHNSON, N., KOTZ, Z., AND BALAKRISHNAN, N. Discrete multivariate distributions. John Wiley & Sons, 1997.
|
| |
21
|
|
| |
22
|
|
| |
23
|
KOIVISTO, M., PEROLA, M., VARILO, T., ET AL. An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. In Pacific Symposium on Biocomputing (2003), pp. 502--513.
|
 |
24
|
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
PATIL, N., BERNO, A. J., HINDS, D. A., ET AL. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294 (2001),1669--70.
|
| |
31
|
|
| |
32
|
SALMENKIVI, M., KERE, J., AND MANNILA, H. Genome segmentation using piecewise constant intensity models and reversible jump MCMC. In Proceedings of the European Conference on Computational Biology (2002), pp. 211--218.
|
| |
33
|
|
| |
34
|
TERZI, E., AND TSAPARAS, P. Efficient algorithms for sequence segmentation. In SDM (2006), p. To appear.
|
| |
35
|
ZHANG, K., CHENG, M., CHEN, T., WATERMAN, M.,AND SUN, F. A dynamic programming algorithm for haplotype block partitioning. PNAS 99 (2002), 7335--7339.
|
|