ACM Home Page
Please provide us with feedback. Feedback
Automatic categorization of figures in scientific documents
Full text PdfPdf (699 KB)
Source International Conference on Digital Libraries archive
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries table of contents
Chapel Hill, NC, USA
SESSION: Document analysis table of contents
Pages: 129 - 138  
Year of Publication: 2006
ISBN:1-59593-354-9
Authors
Xiaonan Lu  The Pennsylvania State University, University Park, Pennsylvania
Prasenjit Mitra  The Pennsylvania State University, University Park, Pennsylvania
James Z. Wang  The Pennsylvania State University, University Park, Pennsylvania
C. Lee Giles  The Pennsylvania State University, University Park, Pennsylvania
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 70,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1141753.1141778
What is a DOI?

ABSTRACT

Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly articles. We have developed a machine-learning-based approach for automatic categorization of figures. Both global features, such as texture, and part features, such as lines, are utilized in the architecture for discriminating among figure categories. The proposed approach has been evaluated on a testbed document set collected from the CiteSeer scientific literature digital library. Experimental evaluation has demonstrated that our algorithms can produce acceptable results for realworld use. Our tools will be integrated into a scientific document digital library.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
I. Biederman. Recognition-by-components: A theory of human image understanding. Psychological Review, 94:115--147, 1987.
 
3
 
4
 
5
6
7
 
8
9
 
10
E. Giladi, M. G. Walker, J. Z. Wang, and W. Volkmuth. SST: An algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size. Bioinformatics, 18(6):873--879, 2002.
11
 
12
13
 
14
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, NY, 2001.
 
15
16
 
17
T. Joachims. Making Large-Scale Support Vector Machine Learning Practical. MIT Press, Cambridge, MA, 1998.
 
18
D. Joshi, J. Li, and J. Z. Wang. A computationally efficient approach to the estimation of two- and three-dimensional hidden markov models. IEEE Transactions on Image Processing, 2006, to appear.
 
19
J. Li and R. M. Gray. Context-based multiscale classification of document images using wavelet coefficient distributions. IEEE Transactions on Image Processing, 9(9):1604--1616, 2000.
 
20
 
21
S. Mao and A. Rosenfeld. Document structure analysis algorithms: a literature survey. In Proceedings of SPIE, pages 197--207, 2003.
22
 
23
24
 
25
 
26
27
 
28
 
29
 
30


Collaborative Colleagues:
Xiaonan Lu: colleagues
Prasenjit Mitra: colleagues
James Z. Wang: colleagues
C. Lee Giles: colleagues