ACM Home Page
Please provide us with feedback. Feedback
CiteSeerχ: a scalable autonomous scientific digital library
Full text PdfPdf (230 KB)
Source ACM International Conference Proceeding Series; Vol. 152 archive
Proceedings of the 1st international conference on Scalable information systems table of contents
Hong Kong
Article No. 18  
Year of Publication: 2006
ISBN:1-59593-428-6
Authors
Huajing Li  Pennsylvania State University, PA
Isaac G. Councill  Pennsylvania State University, PA
Levent Bolelli  Pennsylvania State University, PA
Ding Zhou  Pennsylvania State University, PA
Yang Song  Pennsylvania State University, PA
Wang-Chien Lee  Pennsylvania State University, PA
Anand Sivasubramaniam  Pennsylvania State University, PA
C. Lee Giles  Pennsylvania State University, PA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 58,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1146847.1146865
What is a DOI?

ABSTRACT

CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
"Smealsearch," http://smealsearch.psu.edu.
4
 
5
"Computing research repository. http://arxiv.org/corr/home."
6
7
 
8
"Science direct. http://www.sciencedirect.com."
 
9
"Google scholar. http://scholar.google.com."
10
 
11
J. Stribling, I. G. Councill, J. Li, M. F. Kaashoek, D. R. Karger, R. Morris, and S. Shenker, "Overcite: A cooperative digital research library," in Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS05), Ithaca, NY, February 2005.
 
12
R. Kahn and R. Wilensky, "A framework for distributed digital services," http://www.cnri.reston.va.us/home/cstr/arch/k-w.html, 1995.
 
13
"The simple digital library interoperability protocol (sdlip-core)," http://dbpubs.stanford.edu:8091/testbed/doc2/SDLIP//.
 
14
M. D. Giacomo, M. Martinez, and J. Scott, "A large-scale digital library system to integrate heterogeneous data of distributed databases." in Euro-Par, 2004, pp. 391--397.
15
 
16
"Gendl -- generic digital library," http://elib.cs.berkeley.edu.
 
17
"Greenstone digital library software," http://www.greenstone.org/cgi-bin/library.
 
18
"Dspace digital repository system," http://www.dspace.org/.
 
19
T. Staples, R. Wayland, and S. Payette, "The fedora project: An open-source digital object repository system," D-LIb Magazine, Vol. 9, April 2003.
 
20
21
 
22
 
23
E. Garfield, "Science citation index a new dimension in indexing," Science, Vol. 144, pp. 649--654, 1964.
 
24
Linux Virtual Servers for Scalable Network Services, 2000.
 
25
I. Councill, H. Li, Z. Zhuang, S. Debnath, L. Bolelli, W. Lee, A. Sivasubramaniam, and C. Giles, "Learning metadata from the evidence in an on-line citation matching scheme," submitted.
 
26
V. I. Levenshtein, "Binary codes capable of correcting spurious insertions and deletions of ones," Problems of Information Transmission, Vol. 1, pp. 8--17, 1965.
 
27
R. Kahn and R. Wilensky, "A framework for distributed digital object services," Working Paper, cnri.dlib/tn95-01, 1995.
28
 
29
 
30
"ebizsearch," http://www.ebizsearch.org.
 
31

Collaborative Colleagues:
Huajing Li: colleagues
Isaac G. Councill: colleagues
Levent Bolelli: colleagues
Ding Zhou: colleagues
Yang Song: colleagues
Wang-Chien Lee: colleagues
Anand Sivasubramaniam: colleagues
C. Lee Giles: colleagues