| Simple and effective visual models for gene expression cancer diagnostics |
| Full text |
Pdf
(708 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
table of contents
Chicago, Illinois, USA
SESSION: Research track paper
table of contents
Pages: 167 - 176
Year of Publication: 2005
ISBN:1-59593-135-X
|
|
Authors
|
|
Gregor Leban
|
University of Ljubljana, Tržaška 25, Ljubljana, Slovenia
|
|
Minca Mramor
|
University of Ljubljana, Tržaška 25, Ljubljana, Slovenia
|
|
Ivan Bratko
|
University of Ljubljana, Tržaška 25, Ljubljana, Slovenia
|
|
Blaz Zupan
|
University of Ljubljana, Tržaška 25, Ljubljana, Slovenia
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 72, Citation Count: 0
|
|
|
ABSTRACT
In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. A. Armstrong, J. E. Staunton, L. B. Silverman, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1):41--47, 2001.
|
| |
2
|
A. Bhattacharjee, W. G. Richards, J. Staunton, et al. Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. PNAS, 98(24):13790--13795, 2001.
|
| |
3
|
D. Boue and T. LeBien. Expression and structure of cd22 in acute leukemia. Blood, 71(5):1480--1486, 1988.
|
| |
4
|
C. Brunsdon, A. S. Fotheringham, and M. Charlton. An investigation of methods for visualising highly multivariate datasets. Case Studies of Visualization in the Social Sciences, pages 55--80, 1998.
|
| |
5
|
J. E. Cutting and P. M. Vishton. Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In Handbook of perception and cognition, pages 69--117. Academic Press, San Diego, CA, 1995.
|
| |
6
|
B. W. Dasarathy. Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, 1991.
|
| |
7
|
J. Demšar and B. Zupan. From experimental machine learning to interactive data mining, a white paper. AI Lab, Faculty of Computer and Information Science, Ljubljana, 2004.
|
| |
8
|
Janez Demšar , Blaž Zupan , Gregor Leban , Tomaz Curk, Orange: from experimental machine learning to interactive data mining, Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, p.537-539, September 20-24, 2004, Pisa, Italy
|
| |
9
|
L. M. Fu and C. S. Fu-Liu. Multi-class cancer subtype classification based on gene expression signatures with reliability analysis. FEBS Letters, 561(1-3):186--190, 2004. TY - ABST.
|
| |
10
|
T. R. Golub, D. K. Slonim, P. Tamayo, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439):531--537, 1999.
|
| |
11
|
D. Hanahan and R. Weinberg. The hallmarks of cancer. Cell, 100(1):57--70, 2000.
|
| |
12
|
Patrick Hoffman , Georges Grinstein , Kenneth Marx , Ivo Grosse , Eugene Stanley, DNA visual and analytic data mining, Proceedings of the 8th conference on Visualization '97, p.437-ff., October 18-24, 1997, Phoenix, Arizona, United States
|
| |
13
|
J. Khan, J. S. Wei, M. Ringnér, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. 7, 6(1):673--679, 2001.
|
| |
14
|
|
| |
15
|
I. Kononenko and E. Simec. Induction of decision trees using relieff. In Mathematical and statistical methods in artificial intelligence. Springer Verlag, 1995.
|
| |
16
|
L. Liu, L. McGavran, M. A. Lovell, et al. Nonpositive terminal deoxynucleotidyl transferase in pediatric precursor b-lymphoblastic leukemia. American Journal of Clinical Pathology, 121(6):810--815, 2004.
|
| |
17
|
C. L. Nutt, D. R. Mani, R. A. Betensky, et al. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res, 63(7):1602--1607, 2003.
|
| |
18
|
S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870):436--442, 2002.
|
| |
19
|
M. A. Shipp, K. N. Ross, P. Tamayo, et al. Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8(1):68--74, 2002.
|
| |
20
|
D. Singh, P. G. Febbo, K. Ross, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203--209, 2002.
|
| |
21
|
|
|