|
ABSTRACT
Managing large-scale software projects involves a number of activities such as viewpoint extraction, feature detection, and requirements management, all of which require a human analyst to perform the arduous task of organizing requirements into meaningful topics and themes. Automating these tasks through the use of data mining techniques such as clustering could potentially increase both the efficiency of performing the tasks and the reliability of the results. Unfortunately, the unique characteristics of this domain, such as high dimensional, sparse, noisy data sets, resulting from short and ambiguous expressions of need, as well as the need for the interactive engagement of stakeholders at various stages of the process, present difficult challenges for standard clustering algorithms. In this paper, we propose a semi-supervised clustering framework, based on a combination of consensus-based and constrained clustering techniques, which can effectively handle these challenges. Specifically, we provide a probabilistic analysis for informative constraint generation based on a co-association matrix, and utilize consensus clustering to combine multiple constrained partitions in order to generate high-quality, robust clusters. Our approach is validated through a series of experiments on six well-studied TREC data sets and on two sets of user requirements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Banerjee, A. and Ghosh, J. 2002. Frequency sensitive competitive learning for clustering on high-dimensional hyperspheres. In Proc. IEEE Int. Joint Conf. on Neural Networks (Honolulu, Hawaii, May 2002), pp. 1590--1595.
|
| |
2
|
|
| |
3
|
Basu, S., Banerjee, A., and Mooney, R. J. 2004. Active semisupervision for pairwise constrained clustering. In Proc. of the 4th SIAM International Conference on Data Mining (Orlando, FL, 2004), pp. 333--344.
|
 |
4
|
|
| |
5
|
Bennett, K. P., Bradley, P. S. and Demiriz, 2000. A. Constrained K-Means Clustering. Microsoft Technical Report, May 2000.
|
 |
6
|
Mikhail Bilenko , Sugato Basu , Raymond J. Mooney, Integrating constraints and metric learning in semi-supervised clustering, Proceedings of the twenty-first international conference on Machine learning, p.11, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015360]
|
| |
7
|
Castro-Herrera, C., Duan, C., Cleland-Huang, J. and Mobasher, B. 2008. Using Data Mining and Recommender Systems to Facilitate Large-Scale, Open, and Inclusive Requirements Elicitation Processes, Short Paper, IEEE Conf. on Requirements Eng., (Barcelona, Spain, Sept. 2008).
|
 |
8
|
|
| |
9
|
Cohn, D., Caruana R., and McCallum, A. 2003. Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University, 2003.
|
| |
10
|
Davidson, I. and Ravi, S. S. 2005. Hierarchical clustering with constraints: theory and practice. In: Proc. 9th European principles and practice of KDD (PKDD'05). Porto, Portugal pp 59--70.
|
| |
11
|
Davidson I. and Ravi S. S. 2006. Identifying and Generating Easy Sets of Constraints For Clustering, 21st AAAI Conference, 2006.
|
| |
12
|
Davidson I., Wagstaff, K., and Basu, S. 2006. Measuring Constraint-Set Utility for Partitional Clustering Algorithms, In Proceeding of ECML/PKDD, 2006.
|
 |
13
|
|
 |
14
|
|
| |
15
|
Duan, C., Clustering and its Application in Requirements Engineering, Technical Report #08-001, School of Computing, (DePaul University, February, 2008).
|
| |
16
|
Fern, X. Z. and Brodley, C. E. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In Proc. of ICML'03, Washington, DC (2003) 186--193.
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
Wei Tang , Hui Xiong , Shi Zhong , Jie Wu, Enhancing semi-supervised clustering: a feature projection perspective, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281268]
|
| |
24
|
|
| |
25
|
|
|