ACM Home Page
Please provide us with feedback. Feedback
Pragmatic text mining: minimizing human effort to quantify many issues in call logs
Full text PdfPdf (852 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
SESSION: Industrial and government applications track papers table of contents
Pages: 852 - 861  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
George Forman  Hewlett-Packard Labs, Palo Alto, CA
Evan Kirshenbaum  Hewlett-Packard Labs, Palo Alto, CA
Jaap Suermondt  Hewlett-Packard Labs, Palo Alto, CA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 116,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150520
What is a DOI?

ABSTRACT

We discuss our experiences in analyzing customer-support issues from the unstructured free-text fields of technical-support call logs. The identification of frequent issues and their accurate quantification is essential in order to track aggregate costs broken down by issue type, to appropriately target engineering resources, and to provide the best diagnosis, support and documentation for most common issues. We present a new set of techniques for doing this efficiently on an industrial scale, without requiring manual coding of calls in the call center. Our approach involves (1) a new text clustering method to identify common and emerging issues; (2) a method to rapidly train large numbers of categorizers in a practical, interactive manner; and (3) a method to accurately quantify categories, even in the face of inaccurate classifications and training sets that necessarily cannot match the class distribution of each new month's data. We present our methodology and a tool we developed and deployed that uses these methods for tracking ongoing support issues and discovering emerging issues at HP.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Deerwester, S., Dumais, S., Furnas, G, Landauer, T, and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
 
4
5
 
6
Forman, G. Counting positives accurately despite inaccurate classification. In Proc. of the 16th European Conf. on Machine Learning (ECML, Porto):564--575, 2005.
 
7
 
8
 
9
 
10
 
11
Li, X., Wang, L., and Sung, E. Multilabel SVM active learning for image classification. In Proc. of the Int'l Conf. on Image Processing (ICIP), 4:2207--2210, 2004.
 
12
MacQueen, J. B. Some Methods for classification and Analysis of Multivariate Observations, In Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, Univ. of California Press, 1:281--297, 1967.
13
14
15
 
16
Suermondt, J., Kirshenbaum, E., Forman, G., and Stinger, J. The 10-second answer: practical text clustering for topic discovery. Forthcoming. HP Labs, Tech.Rpt. HPL-2006-41.
 
17
Thearling, K. Some thoughts on the current state of data mining software applications. Workshop: Keys to the Commercial Success of Data Mining, 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery in Data Mining (KDD, New York), 1998.


Collaborative Colleagues:
George Forman: colleagues
Evan Kirshenbaum: colleagues
Jaap Suermondt: colleagues