ACM Home Page
Please provide us with feedback. Feedback
BIO-AJAX: an extensible framework for biological data cleaning
Full text PdfPdf (2.31 MB)
Source ACM SIGMOD Record archive
Volume 33 ,  Issue 2  (June 2004) table of contents
SPECIAL ISSUE: Data engineering for life sciences table of contents
Pages: 51 - 57  
Year of Publication: 2004
ISSN:0163-5808
Authors
Katherine G. Herbert  University Heights, Newark, NJ
Narain H. Gehani  University Heights, Newark, NJ
William H. Piel  State University of New York at Buffalo, Buffalo, NY
Jason T. L. Wang  University Heights, Newark, NJ
Cathy H. Wu  Georgetown University Medical Center, NW, Washington
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 34,   Citation Count: 3
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1024694.1024703
What is a DOI?

ABSTRACT

As databases become more pervasive through the biological sciences, various data quality issues regarding data legacy, data uniformity and data duplication arise. Due to the nature of this data, each of these problems is non-trivial. For biological data to be corrected and standardized, new methods and frameworks must be developed. This paper proposes one such framework, called BIO-AJAX, which uses principles from data cleaning to improve data quality in biological information systems, specifically in TreeBASE.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. "Gen-Bank." Nuc. Acids Res., 28(1):15--18, 2000.
 
2
Brenner, S. E. "Errors in Genome Annotation." Trends in Gen., 15:132--133, 1999.
 
3
 
4
Devos, D. and Valencia, A. "Intrinsic Errors in Genome Annotation." Trends in Gen., 17:429--431, 2001.
 
5
Federhen, S., Harrison, I., Hotton, C., Leipe, D., Soussov, V., Sternberg, R., and Turner, S. NCBI Taxonomy Homepage. http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/.
 
6
Gajer, P., Schatz, M., and Salzberg, S. L. "Automated Correction of Genome Sequence Errors." Nuc. Acids Res., 32:562--569, 2004.
 
7
 
8
Hirschman, L., Park, J. C., Tsujii, J., Wong, L., and Wu, C. H. "Accomplishments and Challenges in Literature Data Mining for Biology." Bioinformatics, 18(12):1553--1561, 2002.
 
9
 
10
Ludäscher, B., Gupta, A., and Martone, M. E. "A Model Based Mediator System for Scientific Data Management." Eds. Z. Lacroix and T. Critchlow, Bioinformatics: Managing Scientific Data, Morgan Kaufmann Publishers, 2003, pp. 335--370.
 
11
Piel, W. H., Sanderson, M. J., and Donoghue, M. "The Small-world Dynamics of Tree Networks and Data Mining in Phyloinformatics." Bioinformatics, 19(9):1162--1168, 2003.
 
12
 
13
 
14
Wang, J. T. L., Marr, T. G., Shasha, D., Shapiro, B. A., Chirn, G. W., and Lee, T. Y. "Complementary Classification Approaches for Protein Sequences." Protein Engineering, 9(5):381--386, 1996.
 
15
Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., Tatusova, T. A., and Rapp, B. A. "Database Resources of the National Center for Biotechnology Information." Nuc. Acids Res., 28(1):10--14, 2000.
 
16
Wu, C. H., Huang, H., Yeh, L. S. L., and Barker, W. C. "Protein Family Classification and Functional Annotation." Computational Biology and Chemistry, 27:37--47, 2003.
 
17
Wu, C. H., Yeh, L.-S., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R. S., Suzek, B. E., Vinayaka, C. R., Zhang, J., and Barker, W. C. "The Protein Information Resource." Nuc. Acids Res., 31(1):345--347, 2003.

Collaborative Colleagues:
Katherine G. Herbert: colleagues
Narain H. Gehani: colleagues
William H. Piel: colleagues
Jason T. L. Wang: colleagues
Cathy H. Wu: colleagues