|
ABSTRACT
We propose an extension to the semistructured data model that captures and integrates information about the quality of the stored data. Specifically, we describe the main challenges involved in measuring and representing data quality, and how we addressed them. These challenges include extending an existing data model to include quality metadata, identifying useful quality measures, and devising a way to compute and update the value of the quality measures as data is queried and updated. Although our approach can be generalized to various other domains, it is currently aimed at describing the quality of biological data sources. We illustrate the benefits of our model using several examples from biological databases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
AGAVE - Architecture for Genomic Annotation, Visualization and Exchange. Available at <u>http://www.agavexml.org/</u>
|
| |
3
|
Ballou, D., Madnick, S., and Wang, R. Assuring Information Quality. Journal of Management Information Systems, 20, 3(2004), 9--11.
|
| |
4
|
BSML -Bio Sequence Markup Language. Available at <u>http://www.bsml.org/</u>
|
 |
5
|
|
 |
6
|
Peter Buneman , Susan Davidson , Gerd Hillebrand , Dan Suciu, A query language and optimization techniques for unstructured data, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.505-516, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
7
|
Calvanese, D., De Giacomo, G., and Lenzerini, M. Modeling and Querying Semi-Structured Data. Networking and Information Systems Journal, 2, 2(1999), 253--273.
|
| |
8
|
DDBJ -DNA Data Bank of Japan. Available at <u>http://www.ddbj.nig.ac.ip/</u>
|
| |
9
|
EMBL Nucleotide Sequence Database. Available at <u>http://www.ebi.ac.uk/embl/</u>
|
| |
10
|
GenBank. Available at <u>http://www.ncbi.nlm.nih.gov/Genbank/index.html</u>
|
| |
11
|
|
| |
12
|
Lee, Y. W. and Strong, D. M. Knowing-Why About Data Processes and Data Quality. Journal of Management Information Systems, 20, 3 (Winter 2003-4), 13--39.
|
| |
13
|
|
 |
14
|
|
| |
15
|
Massimo Mecella , Monica Scannapieco , Antonino Virgillito , Roberto Baldoni , Tiziana Catarci , Carlo Batini, Managing Data Quality in Cooperative Information Systems, On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002, p.486-502, October 30-November 01, 2002
|
| |
16
|
Mihaila, G., Raschid, L., Vidal, M. E. Querying "quality of data" metadata. Proc. of the Third IEEE Meta-Data Conference. Bethesda, Maryland (April 1999), 526--531.
|
| |
17
|
Missier, P., Batini, C. A Multidimensional Model for Information Quality in Cooperative Information Systems. Proceedings of the Eighth International Conference on Information Quality (2003), 25--40.
|
| |
18
|
Müller, H., Naumann, F., Freytag J. C. Data Quality in Genome Databases. Proceedings of the Eighth International Conference on Information Quality (2003), 269--284.
|
| |
19
|
|
| |
20
|
NCBI Reference Sequences. Available at <u>http://www.ncbi.nlm.nih.gov/RefSeq/</u>
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
The Biopolymer Markup Language -BIOML, Working Draft Proposal. Available at <u>http://www.proteome.ca/x-bang/bioml/b_toc.htm</u>
|
 |
26
|
|
| |
27
|
|
| |
28
|
XEMBL. Available at <u>http://www.ebi.ac.uk/xembl/</u>
|
|