ACM Home Page
Please provide us with feedback. Feedback
A comparative study of citations and links in document classification
Full text PdfPdf (275 KB)
Source International Conference on Digital Libraries archive
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries table of contents
Chapel Hill, NC, USA
SESSION: Classification and links table of contents
Pages: 75 - 84  
Year of Publication: 2006
ISBN:1-59593-354-9
Authors
Thierson Couto  University of Minas Gerais, Belo Horizonte, Brazil
Marco Cristo  University of Minas Gerais, Belo Horizonte, Brazil
Marcos André Gonçalves  University of Minas Gerais, Belo Horizonte, Brazil
Pável Calado  IST/INESC-ID, Lisboa, Portugal
Nivio Ziviani  University of Minas Gerais, Belo Horizonte, Brazil
Edleno Moura  Federal University of Amazonas, Manaus, Brazil
Berthier Ribeiro-Neto  Federal University Minas Gerais, Belo Horizonte, Brazil and Google Engineering, Belo Horizonte, Brazil
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 98,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1141753.1141766
What is a DOI?

ABSTRACT

It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37% over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. This combination is based on the notion of classifier reliability and presented gains of up to 14% in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Amsler. Application of citation-based automatic classification. Technical report, The University of Texas at Austin, Linguistics Research Center, December 1972.
 
2
J. Bichtler and E. A. Eaton III. The combined use of bibliographic coupling and cocitation for document retrieval. Journal of the American Society for Information Science, 31(4):278--282, July 1980.
 
3
 
4
5
6
 
7
C. Chang and C. J. Lin. Libsvm: a library for support vector machines. 2001.
 
8
D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 430--436. MIT Press, 2001.
 
9
M. Cristo, P. Calado, E. Moura, and B. R.-N. Nivio Ziviani. Link information as a similarity measure in web classification. In 10th Symposium On String Processing and Information Retrieval SPIRE 2003, volume 2857 of Lecture Notes in Computer Science, pages 43--55, Oct. 2003.
 
10
 
11
M. Fisher and R. Everson. When are links useful? Experiments in text classification. In Advances in Information Retrieval, 25th European Conference on IR Research, ECIR2003, Proceedings, pages 41--56, April 2003.
 
12
 
13
E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178 4060):471--479, 1972.
14
 
15
 
16
 
17
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14(1):10--25, January 1963.
18
 
19
20
 
21
22
23
 
24
 
25
 
26
H. G. Small. Co-citation in the scientific literature: A new measure of relationship between two documents. Journal of the American Society for Information Science, 24(4):265--269, July 1973.
27
 
28
 
29


Collaborative Colleagues:
Thierson Couto: colleagues
Marco Cristo: colleagues
Marcos André Gonçalves: colleagues
Pável Calado: colleagues
Nivio Ziviani: colleagues
Edleno Moura: colleagues
Berthier Ribeiro-Neto: colleagues