ACM Home Page
Please provide us with feedback. Feedback
Multi-evidence, multi-criteria, lazy associative document classification
Full text PdfPdf (269 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 15th ACM international conference on Information and knowledge management table of contents
Arlington, Virginia, USA
SESSION: Classification - 1 table of contents
Pages: 218 - 227  
Year of Publication: 2006
ISBN:1-59593-433-2
Authors
Adriano Veloso  Federal University of Minas Gerais, Belo Horizonte, Brazil
Wagner Meira, Jr.  Federal University of Minas Gerais, Belo Horizonte, Brazil
Marco Cristo  Federal University of Minas Gerais, Belo Horizonte, Brazil
Marcos Gonçalves  Federal University of Minas Gerais, Belo Horizonte, Brazil
Mohammed Zaki  Rensselaer Polytechnic Institute, Troy
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 56,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1183614.1183649
What is a DOI?

ABSTRACT

We present a novel approach for classifying documents that combines different pieces of evidence (e.g., textual features of documents, links, and citations) transparently, through a data mining technique which generates rules associating these pieces of evidence to predefined classes. These rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the "best" rule to be applied at classification time. Our method is able to perform evidence enhancement by link forwarding/backwarding (i.e., navigating among documents related through citation), so that new pieces of link-based evidence are derived when necessary. Furthermore, instead of inducing a single model (or rule set) that is good on average for all predictions, the proposed approach employs a lazy method which delays the inductive process until a document is given for classification, therefore taking advantage of better qualitative evidence coming from the document. We conducted a systematic evaluation of the proposed approach using documents from the ACM Digital Library and from a Brazilian Web directory. Our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state-of-the-art multi-evidence classifiers. We also evaluated our approach using the standard WebKB collection, where our approach showed gains of 1% in accuracy, being 25 times faster. Further, our approach is extremely efficient in terms of computational performance, showing gains of more than one order of magnitude when compared against other multi-evidence classifiers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Amsler. Application of citation-based automatic classification. Technical report, The University of Texas at Austin, Linguistics Research Center, 1972.
 
2
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and regression trees. Wadsworth Intl., 1984.
3
4
 
5
D. Cohn and T. Hofmann. The missing link - A probabilistic model of document content and hypertext connectivity. In Advances in Neural Inf. Processing Systems, pages 430--436. MIT Press, 2001.
 
6
S. Dasgupta, M. Littman, and D. McAllester. PAC generalization bounds for cotraining. In Proc. of Neural Inf. Processing Systems, 2001.
 
7
M. Fisher and R. Everson. When are links useful? Experiments in text classification. In Proc. of ECIR03, pages 41--56, Pisa, Italy, April 2003.
 
8
J. Friedman, R. Kohavi, and Y. Yun. Lazy decision trees. In Proc. of the Nat. Conf. on Artificial Intelligence, pages 717--724, Menlo Park, 1996.
 
9
10
 
11
 
12
 
13
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Knowledge Discovery and Data Mining, pages 80--86, 1998.
 
14
15
 
16
 
17
H. Small. Co-citation in the scientific literature: A new measure of relationship between two documents. JASIS, 24(4):265--269, 1973.
18
19
20
 
21
 
22
 
23
X. Yin and J. Han. CPAR: Classification based on predictive association rules. In Proc. of the SDM03. SIAM, 2003.
24
25


Collaborative Colleagues:
Adriano Veloso: colleagues
Wagner Meira, Jr.: colleagues
Marco Cristo: colleagues
Marcos Gonçalves: colleagues
Mohammed Zaki: colleagues