|
ABSTRACT
This paper describes a series of automatic text categorization experiments with case law documents. Cases are categorized into 40 broad, high-level categories. These results are compared to an existing operational process using Boolean queries manually constructed by domain experts. In this categorization process recall is considered more important than precision. This paper investigates three algorithms that potentially could automate this categorization process: 1) a nearest neighbor-like algorithm, 2) C4.5rules, a machine learning decision tree algorithm; and 3) Ripper, a machine learning rule induction algorithm. The results obtained by Ripper surpass those of the operational process.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Broglio, J.; Callan, J. P.; Croft, W. B.. INQUERY system overview In Proceedings of the TIPSTER TEXT PROGRAM (Phase 1) San Francisco: Morgan Kaufmann, 47-67, 1994.
|
 |
3
|
|
 |
4
|
|
| |
5
|
Cohen, W. W. Efficient pruning methods for separate-and-conquer rule learning systems. In Proceedings of the 13 th International Joint Conference on Artificial Intelligence (Chambery, France), 1993.
|
| |
6
|
Cohen, W. W. Fast effective rule induction Machine Learning: Proceedings of the Twelfth International Conference San Francisco: Morgan Kaufmann, 1995.
|
 |
7
|
|
 |
8
|
|
| |
9
|
Cooper, W. S.. Expected search length: a single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation 19, 1, 30-41, 1968.
|
| |
10
|
Crawford, S. L.; Fung, R.; Appelbaum, L. A.; Tong, R. M. Classification trees for information retrieval In Birnbaum, L. A. and Collins, G. C. (eds.) Machine Learning Proceedings of the Eighth International Workshop (ML91) San Mateo, CA: Morgan Kaufmann, 245-249.
|
 |
11
|
|
| |
12
|
Curran, T. and Thompson, P. Automatic Categorization of Statute Documents Proceedings of the 8th ASIS SIG/CR Classification Research Workshop (Washington, D.C1997), 19-30.
|
 |
13
|
|
| |
14
|
Danet, B. Language in the legal process. Law & Society Review 14, 3, 445-564, 1980.
|
| |
15
|
|
| |
16
|
|
| |
17
|
Kittredge, R. and Lehrberger, J. (eds.). Sublanguage: studies of language in a restricted domain. de Gruyter, New York, 1982.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Lewis, D. D. and Ringuette, M. A comparison of two learning algorithms for text categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval Las Vegas, Nevada: Information Science Research Institute, 81-93, 1994.
|
 |
23
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
 |
24
|
|
 |
25
|
|
| |
26
|
Moulinier, I. Feature selection: a Useful preprocessing step. 19th Annual BCS-IRSG Colloquium on IR Research, 1-11, 1997.
|
| |
27
|
Moulinier, I., Raskinis, G., Ganascia, J.-G. Text categorization: a symbolic approach. In Information Science Research Institute, University of Nevada, Las Vegas (Ed.). Proceedings Fifth Annual Symposium on Document Analysis and Information Retrieval (). Las Vegas: University of Nevada, Las Vegas, 87-99, 1996.
|
 |
28
|
|
| |
29
|
|
| |
30
|
Quinlan, J. R. Improved Use of Continuous Attributes in C4.5 Journal of Artificial Intelligence Research. 4, 77-90, 1996.
|
| |
31
|
Quinlan, R. http://www.rulequest.com, 1997.
|
| |
32
|
Rissland, E. L. and Daniels, J. J. Using CBR to drive IR In International Joint Conference on Artificial Intelligence (IJCAI-95) (Montreal, Canada), 400-407.
|
| |
33
|
Schweighofer, E. and Winiwarter, W. Intelligent information retrieval: KONTERM - Automatic representation of context related terms within a knowledge base for a legal expert system. Proceedings of the 25th Anniversary Conference of the Istituto per la documentazione giuridica of the CNR: Towards a Global Expert System in Law, (Padua, Italy), 1994
|
| |
34
|
|
| |
35
|
West Publishing Company. WESTLAW Reference Manual, 5th ed. West Publishing Company, St. Paul MN, 1993.
|
| |
36
|
West Publishing Company. West's Analysis of American Law. West Publishing Company, St. Paul MN, 1994.
|
| |
37
|
West Publishing Company. West's Law Finder: A Legal Resources Guide. West Publishing Company, St. Paul, MN, 1995.
|
| |
38
|
Yang, Y. Sampling strategies and learning efficiencies in text categorization. In AAAI Spring Symposium on Machine Learning in Information Access, 88-95, 1996.
|
| |
39
|
Yang, Y. An evaluation of statistical approaches to text categorization. Carnegie Mellon University School of Computer Science technical report CMU- CS-97-127, 1997.
|
 |
40
|
|
| |
41
|
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chin Pang Cheng , Jiayi Pan , Gloria T. Lau , Kincho H. Law , Albert Jones, Relating taxonomies with regulations, Proceedings of the 2008 international conference on Digital government research, May 18-21, 2008, Montreal, Canada
|
|
|
|
|
|
|
|
|
|
|