|
ABSTRACT
U.S. regulatory agencies are required to solicit, consider, and respond to public comments before issuing regulations. In recent years, agencies have begun to accept comments via both email and Web forms. The transition from paper to electronic comments makes it much easier for individuals to customize "form" letters, which they do, creating "near-duplicate" comments that express the same viewpoint in slightly different languages. This paper explores the use of simple text clustering and retrieval algorithms for identifying near-duplicate public comments. Experiments with public comments about a recent regulation proposed by the Environmental Protection Agency (EPA) demonstrate the effectiveness of the algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Sergey Brin , James Davis , Héctor García-Molina, Copy detection mechanisms for digital documents, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.398-409, May 22-25, 1995, San Jose, California, United States
|
| |
3
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
| |
4
|
J. Callan, eRulemaking testbed. http://hartford.lti.cs.cmu.edu/eRulemaking/Data/. 2004
|
| |
5
|
|
 |
6
|
|
| |
7
|
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37--46, 1960.
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
N. Heintze. Scalable document fingerprinting. In Proceedings of the Second USENIX electronic Commerce Workshop, pages 191--200, Nov. 1996.
|
| |
12
|
|
| |
13
|
P. Laplace. Philosophical essay on probabilistic. New York: Springer-Verlag. 1995.
|
| |
14
|
W. Pugh. US Patent 6,658,423 http://www.cs.umd.edu/~pugh/google/Duplicates.pdf. 2003
|
| |
15
|
S. Shulman. An experiment in digital government and the United States National Organic Program. Agriculture and Human Values. 2003
|
| |
16
|
|
| |
17
|
|
CITED BY 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Claire Cardie , Cynthia Farina , Adil Aijaz , Matt Rawding , Stephen Purpura, A study in rule-specific issue categorization for e-rulemaking, Proceedings of the 2008 international conference on Digital government research, May 18-21, 2008, Montreal, Canada
|
|
|
|
|