ACM Home Page
Please provide us with feedback. Feedback
Near-duplicate detection for eRulemaking
Full text PdfPdf (248 KB)
Source dg.o; Vol. 89 archive
Proceedings of the 2005 national conference on Digital government research table of contents
Atlanta, Georgia
SESSION: E-rulemaking table of contents
Pages: 78 - 86  
Year of Publication: 2005
Authors
Hui Yang  Carnegie Mellon University, Pittsburgh, PA
Jamie Callan  Carnegie Mellon University, Pittsburgh, PA
Sponsor
NSF : National Science Foundation
Publisher
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 26,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

U.S. regulatory agencies are required to solicit, consider, and respond to public comments before issuing regulations. In recent years, agencies have begun to accept comments via both email and Web forms. The transition from paper to electronic comments makes it much easier for individuals to customize "form" letters, which they do, creating "near-duplicate" comments that express the same viewpoint in slightly different languages. This paper explores the use of simple text clustering and retrieval algorithms for identifying near-duplicate public comments. Experiments with public comments about a recent regulation proposed by the Environmental Protection Agency (EPA) demonstrate the effectiveness of the algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
J. Callan, eRulemaking testbed. http://hartford.lti.cs.cmu.edu/eRulemaking/Data/. 2004
 
5
6
 
7
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37--46, 1960.
8
9
 
10
 
11
N. Heintze. Scalable document fingerprinting. In Proceedings of the Second USENIX electronic Commerce Workshop, pages 191--200, Nov. 1996.
 
12
 
13
P. Laplace. Philosophical essay on probabilistic. New York: Springer-Verlag. 1995.
 
14
W. Pugh. US Patent 6,658,423 http://www.cs.umd.edu/~pugh/google/Duplicates.pdf. 2003
 
15
S. Shulman. An experiment in digital government and the United States National Organic Program. Agriculture and Human Values. 2003
 
16
 
17

CITED BY  13