|
ABSTRACT
Gene function discovery is an important and interesting problem in computational analysis of microarray data. In this paper, we investigate the use of a semi-supervised learning algorithm for inferring gene functional classifications from heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence compassions. The semisupervised learning approach aims at minimizing the disagreement between individual models built from each separate information source by employing a co-updating method and making use of both labeled and unlabeled data. Our results suggest that the semisupervised approach could be used for gene functional classification. The data sets and the program code used for the experiments can be accessed from our webpage.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
A. P. Dempster, N. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1, 38 1977.
|
| |
3
|
S. Becker. Mutual information maximization: Models of cortical self-organization. Network: Computation in Neural Systems, 7(1):7--31, February 1996.
|
 |
4
|
|
| |
5
|
M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. S. Furey, J. Manuel Ares, and D. Haussler. Knowledge-based analysis of microarray gene expression data using support vector machines. In Proc. of the National Academy of Science, volume 97, pages 262--267, 2000.
|
| |
6
|
|
| |
7
|
M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.
|
| |
8
|
|
| |
9
|
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Sciences of USA, volume 95, 1998.
|
| |
10
|
T. S. Furey, N. Christianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Hauessler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10):906--914, 2000.
|
| |
11
|
Z. Ghahramani and M. I. Jordan. Supervised learning from incomplete data via an EM approach. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 120--127. Morgan Kaufmann Publishers, Inc., 1994.
|
| |
12
|
H. McGurk and MacDonald. Hearing lips and seeing voices. Nature, 264:746--748, 1976.
|
 |
13
|
|
| |
14
|
|
 |
15
|
Paul Pavlidis , Jason Weston , Jinsong Cai , William Noble Grundy, Gene functional classification from heterogeneous data, Proceedings of the fifth annual international conference on Computational biology, p.249-255, April 22-25, 2001, Montreal, Quebec, Canada
[doi> 10.1145/369133.369228]
|
| |
16
|
|
| |
17
|
|
| |
18
|
D. Roy. Learning from multimodal observations. In IEEE International Conference on Multimedia and Expo (I), pages 579--582, 2000.
|
| |
19
|
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps. In Proc. of the National Academy of Science of USA, volume 96, 1999.
|
| |
20
|
M. K. Tanenhaus, S.-K. M. J., E. K. M., and S. J. E. Integration of visual and linguistic information in spoken language comprehension. Science, 268:1632--1634, 1995.
|
| |
21
|
|
| |
22
|
G. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical Report ML-TR 44, Rutgers University, 2001.
|
| |
23
|
L. Wu, S. L. Oviatt, and P. R. Cohen. Multimodal integration - a statistical view. IEEE Transactions on Multimedia, 1(4):334--341, 1999.
|
| |
24
|
|
 |
25
|
|
|