ACM Home Page
Please provide us with feedback. Feedback
Active learning for e-rulemaking: public comment categorization
Full text PdfPdf (1.62 MB)
Source
dg.o; Vol. 289 archive
Proceedings of the 2008 international conference on Digital government research table of contents
Montreal, Canada
SESSION: Research papers and management, case study & policy papers: e-rulemaking and ontologies table of contents
Pages 234-243  
Year of Publication: 2008
ISBN:978-1-60558-099-9
Authors
Stephen Purpura  Cornell University, Ithaca, NY
Claire Cardie  Cornell University, Ithaca, NY
Jesse Simons  Cornell University, Ithaca, NY
Sponsors
: Routledge
: Elsevier
: Springer
: Cefrio
NCDG : National Center for Digital Government
Publisher
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 34,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

Warning: The download time has expired please click on the item to try again.


ABSTRACT

We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking --- by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address [7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample from the available training data for human labeling when building the sentence-level classifiers employed in public comment categorization. Using an e-rulemaking corpus developed for our purposes [2], we compare our methods to the well-known query by committee (QBC) active learning algorithm [5] and to a baseline that randomly selects instances for labeling in each round of active learning. We show that our methods statistically significantly exceed the performance of the random selection active learner and the query by committee (QBC) variation, requiring many fewer training examples to reach the same levels of accuracy on a held-out test set. This provides promising evidence that automated text categorization methods might be used effectively to support public comment analysis.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
K. Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of ICML-03, 20th International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 2003.
 
2
 
3
C. Coglianese. Weak democracy, strong information: The role of information technology in the rulemaking process. In V. Mayer-Schoenberger and D. Lazer, editors, Electronic Government to Information Government: Governing in the 21ST Century, 2007.
 
4
 
5
 
6
C. Kerwin. The state of rulemaking in the federal government. Technical report, Transcript Panel 1, 2005.
7
8
 
9
D. D. Lewis and J. Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148--156, Rutgers University, New Brunswick, NJ, 1994. Morgan Kaufmann.
10
 
11
 
12
 
13
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
14
 
15
 
16
17
 
18
S. Shulman. Perverse incentives: The case against mass e-mail campaigns. In Proceedings of the Annual Meeting of the American Political Science Association, 2008.
 
19
P. Strauss, T. Rakoff, and C. Farina. Administrative Law. 10th edition, 2003.
 
20
 
21
22

Collaborative Colleagues:
Stephen Purpura: colleagues
Claire Cardie: colleagues
Jesse Simons: colleagues