ACM Home Page
Please provide us with feedback. Feedback
A consensus based approach to constrained clustering of software requirements
Full text PdfPdf (1.02 MB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: KM: clustering table of contents
Pages 1073-1082  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Chuan Duan  DePaul University, Chicago, IL, USA
Jane Cleland-Huang  DePaul University, Chicago, IL, USA
Bamshad Mobasher  DePaul University, Chicago, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 170,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458225
What is a DOI?

ABSTRACT

Managing large-scale software projects involves a number of activities such as viewpoint extraction, feature detection, and requirements management, all of which require a human analyst to perform the arduous task of organizing requirements into meaningful topics and themes. Automating these tasks through the use of data mining techniques such as clustering could potentially increase both the efficiency of performing the tasks and the reliability of the results. Unfortunately, the unique characteristics of this domain, such as high dimensional, sparse, noisy data sets, resulting from short and ambiguous expressions of need, as well as the need for the interactive engagement of stakeholders at various stages of the process, present difficult challenges for standard clustering algorithms. In this paper, we propose a semi-supervised clustering framework, based on a combination of consensus-based and constrained clustering techniques, which can effectively handle these challenges. Specifically, we provide a probabilistic analysis for informative constraint generation based on a co-association matrix, and utilize consensus clustering to combine multiple constrained partitions in order to generate high-quality, robust clusters. Our approach is validated through a series of experiments on six well-studied TREC data sets and on two sets of user requirements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Banerjee, A. and Ghosh, J. 2002. Frequency sensitive competitive learning for clustering on high-dimensional hyperspheres. In Proc. IEEE Int. Joint Conf. on Neural Networks (Honolulu, Hawaii, May 2002), pp. 1590--1595.
 
2
 
3
Basu, S., Banerjee, A., and Mooney, R. J. 2004. Active semisupervision for pairwise constrained clustering. In Proc. of the 4th SIAM International Conference on Data Mining (Orlando, FL, 2004), pp. 333--344.
4
 
5
Bennett, K. P., Bradley, P. S. and Demiriz, 2000. A. Constrained K-Means Clustering. Microsoft Technical Report, May 2000.
6
 
7
Castro-Herrera, C., Duan, C., Cleland-Huang, J. and Mobasher, B. 2008. Using Data Mining and Recommender Systems to Facilitate Large-Scale, Open, and Inclusive Requirements Elicitation Processes, Short Paper, IEEE Conf. on Requirements Eng., (Barcelona, Spain, Sept. 2008).
8
 
9
Cohn, D., Caruana R., and McCallum, A. 2003. Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Cornell University, 2003.
 
10
Davidson, I. and Ravi, S. S. 2005. Hierarchical clustering with constraints: theory and practice. In: Proc. 9th European principles and practice of KDD (PKDD'05). Porto, Portugal pp 59--70.
 
11
Davidson I. and Ravi S. S. 2006. Identifying and Generating Easy Sets of Constraints For Clustering, 21st AAAI Conference, 2006.
 
12
Davidson I., Wagstaff, K., and Basu, S. 2006. Measuring Constraint-Set Utility for Partitional Clustering Algorithms, In Proceeding of ECML/PKDD, 2006.
13
14
 
15
Duan, C., Clustering and its Application in Requirements Engineering, Technical Report #08-001, School of Computing, (DePaul University, February, 2008).
 
16
Fern, X. Z. and Brodley, C. E. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In Proc. of ICML'03, Washington, DC (2003) 186--193.
17
 
18
 
19
 
20
 
21
 
22
23
 
24
 
25

Collaborative Colleagues:
Chuan Duan: colleagues
Jane Cleland-Huang: colleagues
Bamshad Mobasher: colleagues