| Cost-based labeling of groups of mass spectra |
| Full text |
Pdf
(351 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data
table of contents
Paris, France
SESSION: Research sessions: data mining applications
table of contents
Pages: 167 - 178
Year of Publication: 2004
ISBN:1-58113-859-8
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 31, Citation Count: 1
|
|
|
ABSTRACT
We make two main contributions in this paper. First, we motivate and introduce a novel class of data mining problems that arise in labeling a group of mass spectra, specifically for analysis of atmospheric aerosols, but with natural applications to market-basket datasets. This builds upon other recent work in which we introduced the problem of labeling a single spectrum, and is motivated by the advent of a new generation of Aerosol Time-of-Flight Spectrometers, which are capable of generating mass spectra for hundreds of aerosol particles per minute. We also describe two algorithms for group labeling, which differ greatly in how they utilize a linear programming (LP) solver, and also differ greatly from algorithms for labeling a single spectrum.Our second contribution is to show how to automatically select between these two algorithms in a cost-based manner, analogous to how a query optimizer selects from a space of query plans. While the details are specific to the labeling problem, we believe that is a promising first step towards a general framework for cost-based data mining, and opens up an important direction for future search.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
A. Arning et al. A linear method for deviation detection in large databases. In ACM KDD, 1996.
|
| |
3
|
P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
|
| |
4
|
K. Doksun and P. Bickel. Mathematical Statistics, Chapter 6. Prentice Hall, 2001.
|
 |
5
|
Chun-Hung Cheng , Ada Waichee Fu , Yi Zhang, Entropy-based subspace clustering for mining numerical data, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.84-93, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312199]
|
| |
6
|
E. Gard et al. Real-time analysis of individual atmospheric aerosol particles: Design and performance of a portable ATOFMS. In Anal. Chem., pages 4083--4091, 1997.
|
 |
7
|
|
| |
8
|
|
 |
9
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
10
|
Z. Huang et al. Mass spectrum labeling: Theory and practice. CS Department, TR, UW-Madison, 2004.
|
 |
11
|
|
| |
12
|
T. Johnson et al. The 3W model and algebra for unified data mining. In The VLDB Journal, 2000.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.
|
| |
19
|
National Research Council. Research Priorities for Airborne Particulate Matter. Immediate Priorities and a Long-Range Research Portfolio. National Academy Press, 1998.
|
| |
20
|
K. A. Prather et al. Real-time characterization of individual aerosol particles using time-of-flight mass spectrometry. In Anal. Chem., pages 1403--1407, 1994.
|
| |
21
|
|
 |
22
|
|
| |
23
|
D. Suess and K. A. Prather. Mass spectrometry of aerosols. In Chemical Reviews, pages 3007--3035, 1999.
|
| |
24
|
|
 |
25
|
|
 |
26
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
27
|
W. Zhang and R. Korf. An average-case analysis of branch-and-bound with applications: Summary of results. In AAAI, 1992.
|
| |
28
|
|
| |
29
|
S. Sarawagi et al. Integrating mining with relational database systems. In ACM SIGMOD, 1998.
|
|