|
ABSTRACT
CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Steve Lawrence , Kurt Bollacker , C. Lee Giles, Indexing and retrieval of scientific literature, Proceedings of the eighth international conference on Information and knowledge management, p.139-146, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.319970]
|
 |
2
|
C. Lee Giles , Kurt D. Bollacker , Steve Lawrence, CiteSeer: an automatic citation indexing system, Proceedings of the third ACM conference on Digital libraries, p.89-98, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276685]
|
| |
3
|
"Smealsearch," http://smealsearch.psu.edu.
|
 |
4
|
H. Anan , X. Liu , K. Maly , M. Nelson , M. Zubair , J. C. French , E. Fox , P. Shivakumar, Preservation and transition of NCSTRL using an OAI-based architecture, Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, July 14-18, 2002, Portland, Oregon, USA
[doi> 10.1145/544220.544256]
|
| |
5
|
"Computing research repository. http://arxiv.org/corr/home."
|
 |
6
|
|
 |
7
|
Carl Lagoze , William Arms , Stoney Gan , Diane Hillmann , Christopher Ingram , Dean Krafft , Richard Marisa , Jon Phipps , John Saylor , Carol Terrizzi , Walter Hoehn , David Millman , James Allan , Sergio Guzman-Lara , Tom Kalt, Core services in the architecture of the national science digital library (NSDL), Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, July 14-18, 2002, Portland, Oregon, USA
[doi> 10.1145/544220.544264]
|
| |
8
|
"Science direct. http://www.sciencedirect.com."
|
| |
9
|
"Google scholar. http://scholar.google.com."
|
 |
10
|
Shannon Bradshaw , Andrei Scheinkman , Kristian Hammond, Guiding people to information: providing an interface to a digital library using reference as a basis for indexing, Proceedings of the 5th international conference on Intelligent user interfaces, p.37-43, January 09-12, 2000, New Orleans, Louisiana, United States
[doi> 10.1145/325737.325774]
|
| |
11
|
J. Stribling, I. G. Councill, J. Li, M. F. Kaashoek, D. R. Karger, R. Morris, and S. Shenker, "Overcite: A cooperative digital research library," in Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS05), Ithaca, NY, February 2005.
|
| |
12
|
R. Kahn and R. Wilensky, "A framework for distributed digital services," http://www.cnri.reston.va.us/home/cstr/arch/k-w.html, 1995.
|
| |
13
|
"The simple digital library interoperability protocol (sdlip-core)," http://dbpubs.stanford.edu:8091/testbed/doc2/SDLIP//.
|
| |
14
|
M. D. Giacomo, M. Martinez, and J. Scott, "A large-scale digital library system to integrate heterogeneous data of distributed databases." in Euro-Par, 2004, pp. 391--397.
|
 |
15
|
Anoop Kumar , Ranjani Saigal , Robert Chavez , Nikolai Schwertner, Architecting an extensible digital repository, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
[doi> 10.1145/996350.996354]
|
| |
16
|
"Gendl -- generic digital library," http://elib.cs.berkeley.edu.
|
| |
17
|
"Greenstone digital library software," http://www.greenstone.org/cgi-bin/library.
|
| |
18
|
"Dspace digital repository system," http://www.dspace.org/.
|
| |
19
|
T. Staples, R. Wayland, and S. Payette, "The fedora project: An open-source digital object repository system," D-LIb Magazine, Vol. 9, April 2003.
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
E. Garfield, "Science citation index a new dimension in indexing," Science, Vol. 144, pp. 649--654, 1964.
|
| |
24
|
Linux Virtual Servers for Scalable Network Services, 2000.
|
| |
25
|
I. Councill, H. Li, Z. Zhuang, S. Debnath, L. Bolelli, W. Lee, A. Sivasubramaniam, and C. Giles, "Learning metadata from the evidence in an on-line citation matching scheme," submitted.
|
| |
26
|
V. I. Levenshtein, "Binary codes capable of correcting spurious insertions and deletions of ones," Problems of Information Transmission, Vol. 1, pp. 8--17, 1965.
|
| |
27
|
R. Kahn and R. Wilensky, "A framework for distributed digital object services," Working Paper, cnri.dlib/tn95-01, 1995.
|
 |
28
|
Yves Petinot , C. Lee Giles , Vivek Bhatnagar , Pradeep B. Teregowda , Hui Han , Isaac Councill, CiteSeer-API: towards seamless resource location and interlinking for digital libraries, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
[doi> 10.1145/1031171.1031275]
|
| |
29
|
Yves Petinot , Pradeep B. Teregowda , Hui Han , C. Lee Giles , Steve Lawrence , Arvind Rangaswamy , Nirmal Pal, eBizSearch: an OAI-compliant digital library for eBusiness, Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, May 27-31, 2003, Houston, Texas
|
| |
30
|
"ebizsearch," http://www.ebizsearch.org.
|
| |
31
|
|
|