|
ABSTRACT
The web has greatly improved access to scientific literature. However, scientific articles on the web are largely disorganized, with research articles being spread across archive sites, institution sites, journal sites, and researcher homepages. No index covers all of the available literature, and the major web search engines typically do not index the content of Postscript/PDF documents at all. This paper discusses the creation of digital libraries of scientific literature on the web, including the efficient location of articles, full-text indexing of the articles, autonomous citation indexing, information extraction, display of query-sensitive summaries and citation context, hubs and authorities computation, similar document detection, user profiling, distributed error correction, graph analysis, and detection of overlapping documents. The software for the system is available at no cost for non-commercial use.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J.M. Barrie and D.E. Presti. The World Wide Web as an instructional tool. Science, 274:371-372, 1996.
|
 |
2
|
|
 |
3
|
Kurt D. Bollacker , Steve Lawrence , C. Lee Giles, CiteSeer: an autonous Web agent for automatic retrieval and identification of interesting publications, Proceedings of the second international conference on Autonomous agents, p.116-123, May 10-13, 1998, Minneapolis, Minnesota, United States
[doi> 10.1145/280765.280786]
|
 |
4
|
Kurt D. Bollacker , Steve Lawrence , C. Lee Giles, A system for automatic personalized tracking of scientific literature on the Web, Proceedings of the fourth ACM conference on Digital libraries, p.105-113, August 11-14, 1999, Berkeley, California, United States
[doi> 10.1145/313238.313270]
|
 |
5
|
Sergey Brin , James Davis , Héctor García-Molina, Copy detection mechanisms for digital documents, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.398-409, May 22-25, 1995, San Jose, California, United States
|
| |
6
|
|
| |
7
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
| |
8
|
|
| |
9
|
Robert D. Cameron. A universal citation database as a catalyst for reform in scholarly communication. First Monday, 2(4), 1997.
|
| |
10
|
|
 |
11
|
|
| |
12
|
Eugene Garfield. Citation Indexing: Its Theory and Application in Science, Technology, and Humanities. Wiley, New York, 1979.
|
 |
13
|
C. Lee Giles , Kurt D. Bollacker , Steve Lawrence, CiteSeer: an automatic citation indexing system, Proceedings of the third ACM conference on Digital libraries, p.89-98, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276685]
|
| |
14
|
|
 |
15
|
S. Hitchcock , L. Carr , S. Harris , J. M. N. Hey , W. Hall, Citation linking: improving access to online journals, Proceedings of the second ACM international conference on Digital libraries, p.115-122, July 23-26, 1997, Philadelphia, Pennsylvania, United States
[doi> 10.1145/263690.263804]
|
 |
16
|
|
| |
17
|
|
 |
18
|
Steve Lawrence , Kurt Bollacker , C. Lee Giles, Distributed error correction, Proceedings of the fourth ACM conference on Digital libraries, p.232, August 11-14, 1999, Berkeley, California, United States
[doi> 10.1145/313238.313390]
|
| |
19
|
|
| |
20
|
Steve Lawrence and C. Lee Giles. Searching the World Wide Web. Science, 280(5360):98-t00, 1998.
|
| |
21
|
Steve Lawrence and C. Lee Giles. Accessibility of information on the web. Nature, 400(6740):107-109, 1999.
|
| |
22
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. 1998.
|
| |
23
|
|
| |
24
|
E. Selberg and O. Etzioni. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 1995 World Wide Web Conference, 1995.
|
| |
25
|
Kristie Seymore, Andrew McCallum, and Roni Rosen~ feld. Learning hidden Markov model structure for information extraction. In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.
|
| |
26
|
N. Shivakumar and H. Garcia-Molina. SCAM: A copy detection mechanism for digital documents. In 2nd International Conference on the Theory and Practice of Digital Libraries, 1995.
|
 |
27
|
Anthony Tomasic , Héctor García-Molina , Kurt Shoens, Incremental updates of inverted lists for text document retrieval, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.289-300, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
28
|
|
| |
29
|
Anastasios Tombros. Reflecting User Information Needs Through Query Biased Summaries. PhD thesis, Department of Computer Science, University of Glasgow, September 1997.
|
| |
30
|
|
| |
31
|
I.H. Witten, C.G. Nevill-Manning, and S.J.Cunningham. Building a digital library for computer science research: technical issues. In Proceedings Australasian Computer Science Conference, Melbourne, Australia, January 1996.
|
| |
32
|
I.H. Witten, C.G. Nevill-Manning, and S.J. Cunningham. Digital libraries based on fulltext retrieval. In Proceedings of WebNet 96, San Francisco, October 1996.
|
CITED BY 23
|
|
|
|
|
|
|
|
Steve Lawrence , Frans Coetzee , Eric Glover , Gary Flake , David Pennock , Bob Krovetz , Finn Nielsen , Andries Kruger , Lee Giles, Persistence of information on the web: analyzing citations contained in research articles, Proceedings of the ninth international conference on Information and knowledge management, p.235-242, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
Yves Petinot , C. Lee Giles , Vivek Bhatnagar , Pradeep B. Teregowda , Hui Han , Isaac Councill, A service-oriented architecture for digital libraries, Proceedings of the 2nd international conference on Service oriented computing, November 15-19, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
|
Sean M. McNee , Istvan Albert , Dan Cosley , Prateep Gopalkrishnan , Shyong K. Lam , Al Mamunur Rashid , Joseph A. Konstan , John Riedl, On the recommending of citations for research papers, Proceedings of the 2002 ACM conference on Computer supported cooperative work, November 16-20, 2002, New Orleans, Louisiana, USA
|
|
|
Yves Petinot , C. Lee Giles , Vivek Bhatnagar , Pradeep B. Teregowda , Hui Han , Isaac Councill, CiteSeer-API: towards seamless resource location and interlinking for digital libraries, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
Steve Lawrence , David M. Pennock , Gary William Flake , Robert Krovetz , Frans M. Coetzee , Eric Glover , Finn Årup Nielsen , Andries Kruger , C. Lee Giles, Persistence of Web References in Scientific Research, Computer, v.34 n.2, p.26-31, February 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huajing Li , Isaac G. Councill , Levent Bolelli , Ding Zhou , Yang Song , Wang-Chien Lee , Anand Sivasubramaniam , C. Lee Giles, CiteSeerχ: a scalable autonomous scientific digital library, Proceedings of the 1st international conference on Scalable information systems, p.18-es, May 30-June 01, 2006, Hong Kong
|
|
|
|
|
|
Shen Huang , Gui-Rong Xue , Ben-Yu Zhang , Zheng Chen , Yong Yu , Wei-Ying Ma, TSSP: A Reinforcement Algorithm to Find Related Papers, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, p.117-123, September 20-24, 2004
|
|
|
|
|
|
|
|
|
|
|