| Pragmatic text mining: minimizing human effort to quantify many issues in call logs |
| Full text |
Pdf
(852 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Philadelphia, PA, USA
SESSION: Industrial and government applications track papers
table of contents
Pages: 852 - 861
Year of Publication: 2006
ISBN:1-59593-339-5
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 116, Citation Count: 2
|
|
|
ABSTRACT
We discuss our experiences in analyzing customer-support issues from the unstructured free-text fields of technical-support call logs. The identification of frequent issues and their accurate quantification is essential in order to track aggregate costs broken down by issue type, to appropriately target engineering resources, and to provide the best diagnosis, support and documentation for most common issues. We present a new set of techniques for doing this efficiently on an industrial scale, without requiring manual coding of calls in the call center. Our approach involves (1) a new text clustering method to identify common and emerging issues; (2) a method to rapidly train large numbers of categorizers in a practical, interactive manner; and (3) a method to accurately quantify categories, even in the face of inaccurate classifications and training sets that necessarily cannot match the class distribution of each new month's data. We present our methodology and a tool we developed and deployed that uses these methods for tracking ongoing support issues and discovering emerging issues at HP.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Arindam Banerjee , Chase Krumpelman , Joydeep Ghosh , Sugato Basu , Raymond J. Mooney, Model-based overlapping clustering, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081932]
|
 |
2
|
|
| |
3
|
Deerwester, S., Dumais, S., Furnas, G, Landauer, T, and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
|
| |
4
|
|
 |
5
|
|
| |
6
|
Forman, G. Counting positives accurately despite inaccurate classification. In Proc. of the 16th European Conf. on Machine Learning (ECML, Porto):564--575, 2005.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
Li, X., Wang, L., and Sung, E. Multilabel SVM active learning for image classification. In Proc. of the Int'l Conf. on Image Processing (ICIP), 4:2207--2210, 2004.
|
| |
12
|
MacQueen, J. B. Some Methods for classification and Analysis of Multivariate Observations, In Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, Univ. of California Press, 1:281--297, 1967.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
Suermondt, J., Kirshenbaum, E., Forman, G., and Stinger, J. The 10-second answer: practical text clustering for topic discovery. Forthcoming. HP Labs, Tech.Rpt. HPL-2006-41.
|
| |
17
|
Thearling, K. Some thoughts on the current state of data mining software applications. Workshop: Keys to the Commercial Success of Data Mining, 8th ACM SIGKDD Int'l Conf. on Knowledge Discovery in Data Mining (KDD, New York), 1998.
|
|