|
ABSTRACT
Graph-theoretical approaches to biological network analysis have proven to be effective for small networks but are computationally infeasible for comprehensive genome-scale systems-level elucidation of these networks. The difficulty lies in the NP-hard nature of many global systems biology problems that, in practice, translates to exponential (or worse) run times for finding exact optimal solutions. Moreover, these problems, especially those of an enumerative flavor, are often memory-intensive and must share very large sets of data effectively across many processors. For example, the enumeration of maximal cliques - a core component in gene expression networks analysis, cis regulatory motif finding, and the study of quantitative trait loci for high-throughput molecular phenotypes can result in as many as 3^n/3 maximal cliques for a graph with n vertices. Memory requirements to store those cliques reach terabyte scales even on modest-sized genomes. Emerging hardware architectures with ultra-large globally addressable memory such as the SGI Altix and Cray X1 seem to be well suited for addressing these types of data-intensive problems in systems biology. This paper presents a novel framework that provides exact, parallel and scalable solutions to various graph-theoretical approaches to genome-scale elucidation of biological networks. This framework takes advantage of these large-memory architectures by creating globally addressable bitmap memory indices with potentially high compression rates, fast bitwise-logical operations, and reduced search space. Augmented with recent theoretical advancements based on fixed-parameter tractability, this framework produces computationally feasible performance for genome-scale combinatorial problems of systems biology.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] F. N. Abu-Khzam, N. E. Baldwin, M. A. Langston, and N. F. Samatova, On the Relative Efficiency of Maximal Clique Enumeration Algorithms, with Application to High-Throughput Computational Biology, Proceedings, International Conference on Research Trends in Science and Technology, Beirut, Lebanon, 2005.
|
| |
2
|
[2] P. Karp, EcoCyc version 2.7, vol. 7 (5), Jan-Mar, 1996, 8-8.
|
| |
3
|
[3] R. Overbeek, N. Larsen, G. D. Pusch, M. D'Souza, E. Selkov, N. Kyrpides, M. Fonstein, N. Maltsev, and E. Selkov, WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction, vol. 28 (1), Jan 1, 2000, 123-125.
|
| |
4
|
[4] E. Selkov, Y. Grechkin, N. Mikhailova, and E. Selkov, MPW: the Metabolic Pathways Database, vol. 26 (1), Jan 1, 1998, 43-45.
|
| |
5
|
[5] D. A. Fell and H. M. Sauro, Metabolic Control Analysis By Computer-Progress And Prospects, vol. 49 (8-9), 1990, 811-816.
|
| |
6
|
[6] C. H. Schilling, J. S. Edwards, D. Letscher, and B. O. Palsson, Combining pathway analysis with flux balance analysis for the comprehensive study of metabolic systems, vol. 71 (4), 2000, 286-306.
|
| |
7
|
[7] C. H. Schilling, J. S. Edwards, and B. O. Palsson, Toward metabolic phenomics: Analysis of genomic data using flux balances, vol. 15 (3), May-Jun, 1999, 288-295.
|
| |
8
|
[8] S. Schuster, T. Dandekar, and D. A. Fell, Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering, vol. 17 (2), Feb, 1999, 53-60.
|
| |
9
|
[9] C. H. Schilling and B. O. Palsson, Assessment of the metabolic capabilities of Haemophilus influenzae Rd through a genome-scale pathway analysis, vol. 203 (3), Apr 7, 2000, 249-283.
|
| |
10
|
[10] S. Schuster, D. A. Fell, and T. Dandekar, A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks, vol. 18 (3), Mar, 2000, 326-332.
|
| |
11
|
[11] T. Pfeiffer, I. Sanchez-Valdenebro, J. C. Nuno, F. Montero, and S. Schuster, METATOOL: for studying metabolic networks, vol. 15 (3), Mar, 1999, 251-257.
|
| |
12
|
[12] C. H. Schilling, D. Letscher, and B. O. Palsson, Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from? A pathway-oriented perspective, vol. 203 (3), Apr 7, 2000, 229-248.
|
| |
13
|
[13] M. J. Herrgard, M. W. Covert, and B. O. Palsson, Reconciling gene expression data with known genome-scale regulatory network structures, Genome Research, vol. 13 (11), Nov, 2003, 2423-2434.
|
| |
14
|
[14] G. D. Stormo and K. Tan, Mining genome databases to identify and understand new gene regulatory systems, Current Opinion in Microbiology, vol. 5 (2), Apr, 2002, 149-153.
|
| |
15
|
[15] J. J. Wyrick and R. A. Young, Deciphering gene expression regulatory networks, Current Opinion in Genetics & Development, vol. 12 (2), Apr, 2002, 130-136.
|
| |
16
|
[16] N. E. Baldwin, E. J. Chesler, S. Kirov, M. A. Langston, J. R. Snoddy, R. W. Williams, and B. Zhang, Computational, Integrative and Comparative Methods for the Elucidation of Genetic Co-Expression Networks, Journal of Biomedicine and Biotechnology, vol. 2, 2005, 172-180.
|
| |
17
|
[17] E. J. Chesler, L. Lu, S. Shou, Y. Qu, J. Gu, J. Wang, H. C. Hsu, J. D. Mountz, N. E. Baldwin, M. A. Langston, J. B. Hogenesch, D. W. Threadgill, K. F. Manly, and R. W. Williams, Complex Trait Analysis of Gene Expression Uncovers Polygenic and Pleiotropic Networks that Modulate Nervous System Function, Nature Genetics, vol. 37 (3), 2005, 233- 242.
|
| |
18
|
[18] A. C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A. M. Michon, C. M. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M. A. Heurtier, R. R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer, and G. Superti-Furga, Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, vol. 415 (6868), Jan 10, 2002, 141-147.
|
| |
19
|
[19] Y. Ho, A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S. L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Y. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A. R. Willems, H. Sassi, P. A. Nielsen, K. J. Rasmussen, J. R. Andersen, L. E. Johansen, L. H. Hansen, H. Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B. D. Sorensen, J. Matthiesen, R. C. Hendrickson, F. Gleeson, T. Pawson, M. F. Moran, D. Durocher, M. Mann, C. W. V. Hogue, D. Figeys, and M. Tyers, Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry, Nature, vol. 415 (6868), Jan 10, 2002, 180-183.
|
| |
20
|
[20] K. Truong and M. Ikura, The use of FRET imaging microscopy to detect protein-protein interactions and protein conformational changes in vivo, Current Opinion in Structural Biology, vol. 11 (5), Oct, 2001, 573-578.
|
| |
21
|
[21] P. Uetz, L. Giot, G. Cagney, T. A. Mansfield, R. S. Judson, J. R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. J. Yang, M. Johnston, S. Fields, and J. M. Rothberg, A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae, Nature, vol. 403 (6770), Feb 10, 2000, 623-627.
|
| |
22
|
[22] B. P. Kelley, B. B. Yuan, F. Lewitter, R. Sharan, B. R. Stockwell, and T. Ideker, PathBLAST: a tool for alignment of protein interaction networks, vol. 32, Jul 1, 2004, W83-W88.
|
| |
23
|
[23] J. M. Rohwer, S. Schuster, and H. V. Westerhoff, How to recognize monofunctional units in a metabolic system, vol. 179 (3), Apr 7, 1996, 213-228.
|
| |
24
|
|
| |
25
|
[25] E. Tomita, A. Tanaka, and H. Takahashi, The Worst-Case Time Complexity for Generating all Maximal Cliques, Proceedings, Computing and Combinatorics Conference, Jeju Island, Korea, 2004.
|
| |
26
|
[26] F. Kose, W. Weckwerth, T. Linke, and O. Fiehn, Visualizing plant metabolomic correlation networks using clique-metabolite matrices, Bioinformatics, vol. 17, 2001, 1198-1208.
|
| |
27
|
Yun Zhang , Faisal N. Abu-Khzam , Nicole E. Baldwin , Elissa J. Chesler , Michael A. Langston , Nagiza F. Samatova, Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.12, November 12-18, 2005
[doi> 10.1109/SC.2005.29]
|
| |
28
|
[28] N. E. Baldwin, R. L. Collins, M. A. Langston, M. R. Leuze, C. T. Symons, and B. H. Voy., High Performance Computational Tools for Motif Discovery, Proceedings, IEEE International Workshop on High Performance Computational Biology (HiCOMB), Santa Fe, New Mexico, 2004.
|
| |
29
|
[29] F. N. Abu-Khzam, F. Cheetham, F. Dehne, M. A. Langston, S. Pitre, A. Rau-Chaplin, P. Shanbhag, and P. J. Taillon, "ClustalXP," http://ClustalXP.cgmlab.org/.
|
| |
30
|
[30] D. Fernández-Baca, "The Perfect Phylogeny Problem," in Steiner Trees in Industry, X. Cheng and D.-Z. Du, Eds.: Springer, 2002).
|
| |
31
|
[31] I. Bomze, M. Budinich, P. Pardalos, and M. Pelillo, "The Maximum Clique Problem," in Handbook of Combinatorial Optimization, vol. 4, D.-Z. Du and P. M. Pardalos, Eds.: Kluwer Academic Publishers, 1999).
|
| |
32
|
[32] J. M. Robson, "Finding a maximum independent set in time O(2n/4)," LaBRI, Universite Bordeaux, Report 1251-01, 2001.
|
| |
33
|
[33] R. G. Downey and M. R. Fellows, Parameterized Complexity. New York: Springer 1999.
|
| |
34
|
|
 |
35
|
|
| |
36
|
|
| |
37
|
[37] L. S. Chandran and F. Grandoni, Refined Memorisation for Vertex Cover, Proceedings, International Workshop on Parameterized and Exact Computation, Bergen, Norway, 2004.
|
| |
38
|
[38] F. N. Abu-Khzam, M. A. Langston, and P. Shanbhag, Scalable Parallel Algorithms for Difficult Combinatorial Problems: A Case Study in Optimization, Proceedings, International Conference on Parallel and Distributed Computing and Systems, Marina Del Rey, California, 2003.
|
| |
39
|
|
 |
40
|
|
| |
41
|
[41] K. K. Tomczak, V. D. Marinescu, M. F. Ramoni, D. Sanoudou, F. Montanaro, M. Han, L. M. Kunkel, I. S. Kohane, and A. H. Beggs, Expression Profiling and Identification of Novel Genes Involved in Myogenic Differentiation, FASEB Journal, vol. 18 (2), 2004, 403-405.
|
| |
42
|
[42] C. Fried, W. Hordijk, S. J. Prohaska, C. R. Stadler, and P. F. Stadler, The Footprint Sorting Problem, J. Chem. Inf. Comput. Sci., vol. 44 (2), 2004, 332-338.
|
| |
43
|
[43] F. Dehne, M. R. Fellows, M. A. Langston, F. A. Rosamond, and K. Stevens, An O*(2O(k)) FPT Algorithm for the Undirected Feedback Vertex Set Problem, Proceedings, International Computing and Combinatorics Conference, Kunming, China, 2005.
|
CITED BY 4
|
|
Michael A. Langston , Andy D. Perkins , Arnold M. Saxton , Jon A. Scharff , Brynn H. Voy, Innovative computational methods for transcriptomic data analysis, Proceedings of the 2006 ACM symposium on Applied computing, April 23-27, 2006, Dijon, France
|
|
|
Yun Zhang , Faisal N. Abu-Khzam , Nicole E. Baldwin , Elissa J. Chesler , Michael A. Langston , Nagiza F. Samatova, Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.12, November 12-18, 2005
|
|
|
Reza Azimi , Livio Soares , Michael Stumm , Thomas Walsh , Angela Demke Brown, Path: page access tracking to improve memory management, Proceedings of the 6th international symposium on Memory management, October 21-22, 2007, Montreal, Quebec, Canada
|
|
|
Matthew C. Schmidt , Nagiza F. Samatova , Kevin Thomas , Byung-Hoon Park, A scalable, parallel algorithm for maximal clique enumeration, Journal of Parallel and Distributed Computing, v.69 n.4, p.417-428, April, 2009
|
|