ACM Home Page
Please provide us with feedback. Feedback
Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis
Full text MovMov (18:26),  PdfPdf (866 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 360 - 369  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Frizo Janssens  Katholieke Universiteit Leuven
Wolfgang Glänzel  Katholieke Universiteit Leuven
Bart De Moor  Katholieke Universiteit Leuven
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 173,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281233
What is a DOI?

ABSTRACT

To unravel the concept structure and dynamics of the bioinformatics field, we analyze a set of 7401 publications from the Web of Science and MEDLINE databases, publication years 1981-2004. For delineating this complex, interdisciplinary field, a novel bibliometric retrieval strategy is used. Given that the performance of unsupervised clustering and classification of scientific publications is significantly improved by deeply merging textual contents with the structure of the citation graph, we proceed with a hybrid clustering method based on Fisher's inverse chi-square. The optimal number of clusters is determined by a compound semiautomatic strategy comprising a combination of distance-based and stability-based methods. We also investigate the relationship between number of Latent Semantic Indexing factors, number of clusters, and clustering performance. The HITS and PageRank algorithms are used to determine representative publications in each cluster. Next, we develop a methodology for dynamic hybrid clustering of evolving bibliographic data sets. The same clustering methodology is applied to consecutive periods defined by time windows on the set, and in a subsequent phase chains are formed by matching and tracking clusters through time. Term networks for the eleven resulting cluster chains present the cognitive structure of the field. Finally, we provide a view on how much attention the bioinformatics community has devoted to the different subfields through time.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
V. Batagelj and A. Mrvar. Pajek - analysis and visualization of large networks. Graph Drawing, 2265:477--478, 2002.
 
3
A. Ben-Hur, A. Elisseeff, and I. Guyon. A stability based method for discovering structure in clustered data. In Pacific Symposium on Biocomputing, pages 6--17, 2002.
 
4
 
5
 
6
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
 
7
W. Glänzel, F. Janssens, and B. Thijs. A comparative analysis of publication activity and citation impact based on the core literature in bioinformatics. In Proc. 11th Intl. Conf. of the ISSI, Madrid, Spain, 2007.
 
8
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.
 
9
 
10
 
11
L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, 1985.
 
12
 
13
F. Janssens. Clustering of scientific fields by integrating text mining and bibliometrics. Ph.D. thesis, Faculty of Engineering, Katholieke Universiteit Leuven, Belgium, http://hdl.handle.net/1979/847, 2007.
 
14
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.
15
 
16
17
18
 
19
C. A. Ouzounis and A. Valencia. Early bioinformatics: the birth of a discipline - a personal view. Bioinformatics, 19(17):2176--2190, 2003.
 
20
S. K. Patra and S. Mishra. Bibliometric study of bioinformatics literature. Scientometrics, 67(3):477--489, 2006.
 
21
C. Perez-Iratxeta, M. A. Andrade-Navarro, and J. D. Wren. Evolving research trends in bioinformatics. Briefings in Bioinformatics, 2006.
 
22
 
23
24

Collaborative Colleagues:
Frizo Janssens: colleagues
Wolfgang Glänzel: colleagues
Bart De Moor: colleagues