|
ABSTRACT
The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet -- the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resources, and imposing organization on the huge collection of information (and dross) residing on the net. This paper reports on the use of one such crawler to synthesize document collections on various topics in science, mathematics, engineering and technology. Such collections could be part of a digital library.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
W. Arms. Automated digital libraries: How effectively can computers be used for the skill tasks of professional librarianship. D-Lib Magazine: The Magazine of Digital Library Research, July 2000. http://www.dlib.org/dlib/july00/arms/07arms.html
|
 |
2
|
|
| |
3
|
R. K. Belew. Finding Out About. Cambridge Press, 2001
|
| |
4
|
Israel Ben-Shaul , Michael Herscovici , Michal Jacovi , Yoelle S. Maarek , Dan Pelleg , Menachem Shtalhaim , Vladimir Soroka , Sigalit Ur, Adding support for dynamic and focused search with Fetuccino, Proceeding of the eighth international conference on World Wide Web, p.1653-1665, May 1999, Toronto, Canada
|
 |
5
|
|
| |
6
|
C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz. Harvest: A scalable, customizable discovery and access system. Technical Report CU-CS-732-94, Department of Computer Science, University of Colorado, Boulder, July 1994
|
| |
7
|
C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz. The Harvest information discovery and access system, 1994. Additional information available http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/schwartz.harvest/schwartz.harvest.html
|
| |
8
|
|
| |
9
|
A. Broder, S. Glassman, and M. Manasse. Clustering the Web, 1999. Available: http://www.research.compaq.com/SRC/articles/199707/cluster.html
|
| |
10
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
| |
11
|
|
 |
12
|
|
| |
13
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
14
|
S. Chakrabarti, B. E. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web, Melbourne, Australia, 1998. ACM. Available: http://www.almaden.ibm.com/cs/k53/abstract.html
|
| |
15
|
|
| |
16
|
C. Chekuri, M. Goldwasser, P. Raghavan, and E. Upfal. Web search using automatic classification, 1997. Available at http://cm.bell-labs.com/who/chekuri/postscript/web.ps.gz Current as of December 5, 2001
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
E. Garfield. Mapping the structure of science, pages 98--147. John Wiley & Sons, Inc. NY, 1979. Available at http://www.garfield.library.upenn.edu/ci/chapter8.pdf
|
 |
22
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
| |
23
|
E.-H. S. Han and G. Karypis. Centroid-based document classification: Analysis & experimental results. Technical Report 00-017, Computer Science, University of Minnesota, Mar. 2000
|
| |
24
|
T. H. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the Web. In WebDB'2000: Third International Workshop on the Web and Databases, May 2000. Available http://www.research.att.com/conf/webdb2000/PAPERS/8c.ps
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
S. Lawrence and C. L. Giles. Accessibility of information on the Web. Nature, 400(8), July 1999
|
 |
31
|
|
| |
32
|
|
| |
33
|
F. Menczer and R. K. Belew. Adaptive Retrieval Agents: Internalizing Local Context and Scaling up to the Web, pages 1--45. 1999
|
 |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
M. Najork and A. Heydon. High-performance Web crawling. Technical Report Research Report 173, Compaq SRC, Sept. 2001. Available at http://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/abstracts/src-rr-173.html
|
 |
38
|
|
| |
39
|
|
| |
40
|
B. Saulnier. Portal power. Cornell Engineering Magazine, pages 16--21, Fall 2001. Available:http://www.engineering.cornell.edu/engrMagazine/
|
| |
41
|
|
| |
42
|
D. Voss. Better searching through science. Science, 293(5537):2024, 2001. Available: http://www.sciencemag.org/cgi/content/full/293/5537/2024
|
| |
43
|
|
 |
44
|
|
 |
45
|
Ian H. Witten , Stefan J. Boddie , David Bainbridge , Rodger J. McNab, Greenstone: a comprehensive open-source digital library software system, Proceedings of the fifth ACM conference on Digital libraries, p.113-121, June 02-07, 2000, San Antonio, Texas, United States
[doi> 10.1145/336597.336650]
|
 |
46
|
|
| |
47
|
L. L. Zia. The NSF national science, technology, engineering, and mathematics education digital library (NSDL) program: New projects and a project report. D-Lib Magazine: The Magazine of Digital Library Research, 7(11), Nov. 2001
|
CITED BY 12
|
|
Pável P. Calado , Marcos A. Gonçalves , Edward A. Fox , Berthier Ribeiro-Neto , Alberto H. F. Laender , Altigran S. da Silva , Davi C. Reis , Pablo A. Roberto , Monique V. Vieira , Juliano P. Lage, The Web-DL environment for building digital libraries from the Web, Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, May 27-31, 2003, Houston, Texas
|
|
|
|
|
|
Gautam Pant , Kostas Tsioutsiouliklis , Judy Johnson , C. Lee Giles, Panorama: extending digital libraries with topical crawlers, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
William Y. Arms , Selcuk Aya , Pavel Dmitriev , Blazej J. Kot , Ruth Mitchell , Lucia Walle, Building a research library for the history of the web, Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, June 11-15, 2006, Chapel Hill, NC, USA
|
|
|
|
|
|
|
|
|
Robert G. Capra , Christopher A. Lee , Gary Marchionini , Terrell Russell , Chirag Shah , Fred Stutzman, Selection and context scoping for digital video collections: an investigation of youtube and blogs, Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, June 16-20, 2008, Pittsburgh PA, PA, USA
|
|
|
|
|