ACM Home Page
Please provide us with feedback. Feedback
Cost-based labeling of groups of mass spectra
Full text PdfPdf (351 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: data mining applications table of contents
Pages: 167 - 178  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Lei Chen  University of Wisconsin, Madison, Madison, WI
Zheng Huang  University of Wisconsin, Madison, Madison, WI
Raghu Ramakrishnan  University of Wisconsin, Madison, Madison, WI
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 31,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007589
What is a DOI?

ABSTRACT

We make two main contributions in this paper. First, we motivate and introduce a novel class of data mining problems that arise in labeling a group of mass spectra, specifically for analysis of atmospheric aerosols, but with natural applications to market-basket datasets. This builds upon other recent work in which we introduced the problem of labeling a single spectrum, and is motivated by the advent of a new generation of Aerosol Time-of-Flight Spectrometers, which are capable of generating mass spectra for hundreds of aerosol particles per minute. We also describe two algorithms for group labeling, which differ greatly in how they utilize a linear programming (LP) solver, and also differ greatly from algorithms for labeling a single spectrum.Our second contribution is to show how to automatically select between these two algorithms in a cost-based manner, analogous to how a query optimizer selects from a space of query plans. While the details are specific to the labeling problem, we believe that is a promising first step towards a general framework for cost-based data mining, and opens up an important direction for future search.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A. Arning et al. A linear method for deviation detection in large databases. In ACM KDD, 1996.
 
3
P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
 
4
K. Doksun and P. Bickel. Mathematical Statistics, Chapter 6. Prentice Hall, 2001.
5
 
6
E. Gard et al. Real-time analysis of individual atmospheric aerosol particles: Design and performance of a portable ATOFMS. In Anal. Chem., pages 4083--4091, 1997.
7
 
8
9
 
10
Z. Huang et al. Mass spectrum labeling: Theory and practice. CS Department, TR, UW-Madison, 2004.
11
 
12
T. Johnson et al. The 3W model and algebra for unified data mining. In The VLDB Journal, 2000.
 
13
14
 
15
16
 
17
 
18
J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.
 
19
National Research Council. Research Priorities for Airborne Particulate Matter. Immediate Priorities and a Long-Range Research Portfolio. National Academy Press, 1998.
 
20
K. A. Prather et al. Real-time characterization of individual aerosol particles using time-of-flight mass spectrometry. In Anal. Chem., pages 1403--1407, 1994.
 
21
22
 
23
D. Suess and K. A. Prather. Mass spectrometry of aerosols. In Chemical Reviews, pages 3007--3035, 1999.
 
24
25
26
 
27
W. Zhang and R. Korf. An average-case analysis of branch-and-bound with applications: Summary of results. In AAAI, 1992.
 
28
 
29
S. Sarawagi et al. Integrating mining with relational database systems. In ACM SIGMOD, 1998.

Collaborative Colleagues:
Lei Chen: colleagues
Zheng Huang: colleagues
Raghu Ramakrishnan: colleagues