ACM Home Page
Please provide us with feedback. Feedback
Topes: reusable abstractions for validating data
Full text PdfPdf (204 KB)
Source
International Conference on Software Engineering archive
Proceedings of the 30th international conference on Software engineering table of contents
Leipzig, Germany
SESSION: Software tools table of contents
Pages 1-10  
Year of Publication: 2008
ISBN:978-1-60558-079-1
Authors
Christopher Scaffidi  Carnegie Mellon University, Pittsburgh, PA, USA
Brad Myers  Carnegie Mellon University, Pittsburgh, PA, USA
Mary Shaw  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 248,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1368088.1368090
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

Programmers often omit input validation when inputs can appear in many different formats or when validation criteria cannot be precisely specified. To enable validation in these situations, we present a new technique that puts valid inputs into a consistent format and that identifies "questionable" inputs which might be valid or invalid, so that these values can be double-checked by a person or a program. Our technique relies on the concept of a "tope", which is an application-independent abstraction describing how to recognize and transform values in a category of data. We present our definition of topes and describe a development environment that supports the implementation and use of topes. Experiments with web application and spreadsheet data indicate that using our technique improves the accuracy and reusability of validation code and also improves the effectiveness of subsequent data cleaning such as duplicate identification.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Allen, E. et. al. The Fortress Language Specification, Sun Microsystems, 2006.
 
3
Aslam, T., Krsul, I., and Spafford, E. Use of a Taxonomy of Security Faults. Tech. Rpt. TR-96-051, Purdue University, 1996.
 
4
 
5
 
6
 
7
 
8
 
9
 
10
Fisher II, M., and Rothermel, G. The EUSES Spreadsheet Corpus: A Shared Resource for Supporting Experimentation with Spreadsheet Dependability Mechanisms. Tech. Rpt. 04-12-03, University of Nebraska?Lincoln, 2004.
 
11
 
12
13
 
14
 
15
Kennedy, A. Programming Languages and Dimensions. PhD thesis, Tech. Rpt. 391, University of Cambridge, 1996.
 
16
Marsh, E., and Perzanowski, D. MUC-7 Evaluation of IE Technology: Overview of Results. 7th Message Understanding Conf., 2001.
 
17
18
 
19
 
20
 
21
Plasmeijer, R., and Achten, P. The Implementation of iData?A Case Study in Generic Programming. Tech Rpt. TCD-CS-2005-60, Dublin University, 2005.
 
22
Porter, M. An Algorithm for Suffix Stripping. Program, 14, 3 (July 1980), 130--137.
 
23
Rahm, E., and Do, H. Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bulletin, 23, 4 (Dec. 2000), 3--13.
 
24
25
 
26
27
 
28
Scaffidi, C. Unsupervised Inference of Data Formats in Human-Readable Notation. Proc. 9th Intl. Conf. Enterprise Integration Systems ? HCI Volume, 2007, 236--241.
 
29
Scaffidi, C., Shaw, M. Accommodating Data Heterogeneity in ULS Systems. 2nd Intl. Workshop on Ultra-Large-Scale Software-Intensive Systems, at the 30th Intl. Conf. Software Engineering, to appear.
 
30
Scaffidi, C., Myers, B., and Shaw, M. Challenges, Motivations, and Success Factors in the Creation of Hurricane Katrina "Person Locator" Web Sites. Psychology of Programming Interest Group Workshop, 2006.
 
31
Scaffidi, C., Myers, B., and Shaw, M. The Topes Format Editor and Parser. Tech Rpt. CMU-ISRI-07-104, Carnegie Mellon University, 2007.
32
 
33
 
34
Scaffidi, C., Shaw, M., and Myers, B. Games Programs Play: Obstacles to Data Reuse, 2nd Workshop on End User Soft. Eng, 2006.
35
 
36
Zadeh, L. Fuzzy Logic. Tech Rpt. CSLI-88-116, Stanford University, 1988.

CITED BY  6

Collaborative Colleagues:
Christopher Scaffidi: colleagues
Brad Myers: colleagues
Mary Shaw: colleagues