| Breadth-first crawling yields high-quality pages |
| Full text |
Pdf
(164 KB)
|
| Source
|
International World Wide Web Conference
archive
Proceedings of the 10th international conference on World Wide Web
table of contents
Hong Kong, Hong Kong
Pages: 114 - 118
Year of Publication: 2001
ISBN:1-58113-348-0
|
|
Authors
|
|
Marc Najork
|
Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA
|
|
Janet L. Wiener
|
Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 131, Citation Count: 57
|
|
|
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Krishna Bharat , Andrei Broder , Monika Henzinger , Puneet Kumar , Suresh Venkatasubramanian, The connectivity server: fast access to linkage information on the Web, Proceedings of the seventh international conference on World Wide Web 7, p.469-477, April 1998, Brisbane, Australia
|
| |
2
|
|
| |
3
|
M.Burner.Crawling towards eternity:Building an archive of the world wide web.Web Techniques Magazine 2(5):37 -40,May 1997.
|
| |
4
|
|
| |
5
|
Google Inc.Press release:"Google launches world 's largest search engine."June 26,2000.A ailable at http://www.google.com/press/pressrel/pressrelease26.html
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
P.Lyman,H.Varian,J.Dunn,A.Strygin,and K.Swearingen.How much information?School of Information Management and Systems,Uni .of California at Berkeley,2000.A ailable at http://www.sims.berkeley.edu/how-much-info
|
| |
10
|
Mercator Home Page. http://www.research.digital.com/SRC/mercator
|
| |
11
|
J.L.Wiener,R.Wickremesinghe,M.Burrows, K.Randall,and R.Stata.Better link compression. Manuscript in progress.Compaq Systems Research Center,2001.
|
CITED BY 57
|
|
|
|
|
Soumen Chakrabarti , Mukul M. Joshi , Kunal Punera , David M. Pennock, The structure of broad topics on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
|
|
|
C. L. A. Clarke , G. V. Cormack , M. Laszlo , T. R. Lynam , E. L. Terra, The impact of corpus size on question answering performance, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
|
|
|
|
|
|
|
|
|
Nicola Capuano , Matteo Gaeta , Fabio Gasparetti , Alessandro Micarelli, Holmes: a prototype for the targeted search of information about hi-tech companies, Second international workshop on Intelligent systems design and application, p.233-238, August 07-08, 2002, Atlanta, Georgia
|
|
|
|
|
|
|
|
|
|
|
|
Ronald Fagin , Ravi Kumar , Kevin S. McCurley , Jasmine Novak , D. Sivakumar , John A. Tomlin , David P. Williamson, Searching the workplace web, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michelangelo Diligenti , Marco Maggini , Filippo Maria Pucci , Franco Scarselli, Design of a crawler with bounded bandwidth, Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, May 19-21, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Márcio L. A. Vidal , Altigran S. da Silva , Edleno S. de Moura , João M. B. Cavalcanti, Structure-driven crawler generation by example, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B. Barla Cambazoglu , Evren Karaca , Tayfun Kucukyilmaz , Ata Turk , Cevdet Aykanat, Architecture of a grid-enabled Web search engine, Information Processing and Management: an International Journal, v.43 n.3, p.609-623, May, 2007
|
|
|
|
|
|
Mark R. Meiss , Filippo Menczer , Santo Fortunato , Alessandro Flammini , Alessandro Vespignani, Ranking web sites with real user traffic, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
Hiroo Saito , Masashi Toyoda , Masaru Kitsuregawa , Kazuyuki Aihara, A large-scale study of link spam detection by graph algorithms, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zhumin Chen , Jun Ma , Jingsheng Lei , Bo Yuan , Li Lian , Ling Song, A cross-language focused crawling algorithm based on multiple relevance prediction strategies, Computers & Mathematics with Applications, v.57 n.6, p.1057-1072, March, 2009
|
|
|
Marc Spaniol , Dimitar Denev , Arturas Mazeika , Gerhard Weikum , Pierre Senellart, Data quality in web archiving, Proceedings of the 3rd workshop on Information credibility on the web, April 20-20, 2009, Madrid, Spain
|
|
|
Gary Marchionini , Chirag Shah , Christopher A. Lee , Robert Capra, Query parameters for harvesting digital video and associated contextual information, Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, June 15-19, 2009, Austin, TX, USA
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.4
INFORMATION SYSTEMS APPLICATIONS
H.4.3
Communications Applications
Subjects:
Information browsers
Additional Classification:
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.2
Nonnumerical Algorithms and Problems
Subjects:
Sorting and searching
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.5
On-line Information Services
Subjects:
Web-based services
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Interaction styles (e.g., commands, menus, forms, direct manipulation)
H.5.3
Group and Organization Interfaces
Subjects:
Web-based interaction
General Terms:
Design,
Management,
Measurement,
Performance,
Theory
Keywords:
PageRank,
breadth-first search,
crawl order,
crawling,
metric,
page quality
|