ACM Home Page
Please provide us with feedback. Feedback
Clustered subset selection and its applications on it service metrics
Full text PdfPdf (401 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: KM: statistical techniques table of contents
Pages 599-608  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Christos Boutsidis  Rensselaer Polytechnic Institute, Troy, NY, USA
Jimeng Sun  IBM T.J. Watson Lab, Hawthorne, NY, USA
Nikos Anerousis  IBM T.J. Watson Lab, Hawthorne, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 109,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458162
What is a DOI?

ABSTRACT

Motivated by the enormous amounts of data collected in a large IT service provider organization, this paper presents a method for quickly and automatically summarizing and extracting meaningful insights from the data. Termed Clustered Subset Selection (CSS), our method enables program-guided data explorations of high-dimensional data matrices. CSS combines clustering and subset selection into a coherent and intuitive method for data analysis. In addition to a general framework, we introduce a family of CSS algorithms with different clustering components such as k-means and Close-to-Rank-One (CRO) clustering, and Subset Selection components such as best rank-one approximation and Rank-Revealing QR (RRQR) decomposition.

From an empirical perspective, we illustrate that CSS is achieving significant improvements over existing Subset Selection methods in terms of approximation errors. Compared to existing Subset Selection techniques, CSS is also able to provide additional insight about clusters and cluster representatives. Finally, we present a case-study of program-guided data explorations using CSS on a large amount of IT service delivery data collection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
M. Berry, S. Pulatova, and G. Stewart. Computing sparse reduced-rank approximations to sparse matrices. Technical Report UMIACS TR-2004-32 CMSC TR-4589, University of Maryland, College Park, MD, 2004.
3
4
5
 
6
 
7
T. F. Chan. Rank revealing QR factorizations. Linear Algebra and Its Applications, 88/89:67--82, 1987.
 
8
 
9
T. F. Chan and P. C. Hansen. Low-rank revealing QR factorizations. Numerical Linear Algebra with Applications, 1:33--44, 1994.
 
10
 
11
S. J. Cho and M. A. Hermsmeier. Genetic algorithm guided selection: Variable selection and subset selection. Journal of Chemical Information and Computer Sciences, 42(4):927--936, 2002.
 
12
13
 
14
A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approximation. In RANDOM - APPROX, 2006.
 
15
 
16
P. Drineas. Randomized Algorithms for Matrix Operations. PhD thesis, Yale University, 2002.
 
17
 
18
19
 
20
L. V. Foster. Rank and null space calculations using matrix decomposition without column interchanges. Linear Algebra Appl., 74:47--71, 1986.
 
21
L. V. Foster and X. Liu. Comparison of rank revealing algorithms applied to matrices with well defined numerical ranks. submitted, 2006.
 
22
 
23
E. Gallopoulos and D. Zeimpekis. CLSI: A flexible approximation scheme from clustered term-document matrices. In SDM, 2005.
 
24
G. Golub and C. V. Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 1989.
 
25
G. H. Golub. Numerical methods for solving linear least squares problems. Numer. Math., 7:206--216, 1965.
 
26
 
27
 
28
Y. P. Hong and C. T. Pan. Rank-revealing QR factorizations and the singular value decomposition. Mathematics of Computation, 58:213--232, 1992.
29
 
30
 
31
 
32
 
33
Y.-D. Kim and S. Choi. A method of initialization for nonnegative matrix factorization. In Acoustics, Speech and Signal Processing (ICASSP), 2007.
 
34
 
35
C. T. Pan. On the existence and computation of rank-revealing LU factorizations. Linear Algebra Appl., 316:199--222, 2000.
 
36
C. T. Pan and P. T. P. Tang. Bounds on singular values revealed by QR factorizations. BIT Numerical Mathematics, 39:740--756, 1999.
37
 
38
 
39
 
40
G. Stewart. Four algorithms for the efficient computation of truncated QR approximations to a sparse matrix. Numerische Mathematik, 83:313--323, 1999.
 
41
G. Stewart and J. Sun. Matrix Perturbation Theory. Academic Press, New York, 1990.
 
42

Collaborative Colleagues:
Christos Boutsidis: colleagues
Jimeng Sun: colleagues
Nikos Anerousis: colleagues